Statistical Inference

Notes from Statistical Inference conference June 2016

do the due diligence of running the right regression.
Predictive to causal
Create a model
Statistician vs Practicioner
model mispecifications - implications for downstream inference
statistician is somewhat of a sourpuss/grumpy, innate negativism of the field - not as eager to make wild claims, avoid pitfalls
- no information about that quantity in the variables we're measuring
interaction with a collaborator
- investment in statistical procedures that would improve confidence interval/p-val
- be more positive in our interactions
- psychology of human interaction
- issue the caviats earlier in the process, the more integrated the statis
"lies we tell our children" - oversimplifications. in the most favorable circumstances, that's not really what you should be doing.
- no linear regression isn't rights
- how do we convey the meaning of the parameters if the model is misspecified.
Fisher: "If you call in the statistician after the experiment is done, he can only tell you what the experiment died of"

Larry Wasserman

YouTube talk at Becker Friedman institute

Conflict between interpretability and good prediction.
- If I'm doing pure machine learning, do I really care about inference? Don't care about confidence intervals. You may not need to do standard errors or hypothesis tests.

Machine learning tends to stress predicting things well, and to do that you have to balance bias and variance.

In contrast, causal inference is a semi-parametric problem and bias is worse than variance. Economists seek to minimize bias. Bias kills you, you won't get a correctly centered confidence intervals. Influence functions?

3 popular prediction methods for high dimensional feature spaces
- Lasso for linear regression
  - The solution of a convex problem, you can find beta hat
  - Sparse estimator
  - What is the meaning of beta hat? interpretation of the parameter.
  - How do you choose lambda? typically by cross-validation, which balances bias and variance
- random forests
  - non-parametric; A piece-wise constant estimator done by recursive splitting
  - for regression or classification
  - Simple and interpretable
- deep learning

Post-selection inference, what the parameter we're estimating?
- True parameter view: assumption that the true model that generated the data is linear. High dimensional linear model. Probably bogus assumption. Don't believe the model. Don't act like the model is true. Why would the true model be linear? Assumption of linearity isn't testable. We regularize to take an ill-posed problem and make it well-posed.
- Projection parameter view: The Stanfordian point of view: Don't interpret the beta that comes out as a true beta, just the best approximation. What's the best linear predictor. It's the projection onto the space of linear predictor. Model is not linear, but we can still do inference for the projection parameter. Will do model selection, choose a subset of variables based on data, and Beta_s = the best linear predictor on the 10 variables I selected. Inference for random parameters. The data is random and the parameters are random. Not a smooth parameter. Can't just do bootstrap.
- LOCO parameters - leave out covariates: Fit something, and looking at a new pair prediction error. Suppose I didn't have access to covariate J... How much does my prediction change? Running entire process without variable, gives a new prediction and look at the difference, what is the prediction. How much would I pay in predicion by not knowing the variable. Doesn't depend on linearity, model correctness, very interpretable. Bootstrap much more accurate for this than it is for estimating other parameters.

Conformalalization - instead of leave one out, it's add one in.
- R code package conformal

Inference Techniques for p >> N

debiasing - (javanmard and montanari 2014) - no model selection. Lasso gives you a sparse estimator but it's biased, so debias
conditional - target the projection parameter. Assume error is random. Select a model, get beta, choose an event E, condi
uniform
sample splitting, projection
sample splitting, LOCO
Conformal
forgotten methods - unsupervised dimensionality reduction, variable clustering, non-linear dimension reduction
Add screenshot

Coverage

What's the probability that your interval covers the truth
Ideal coverage that theta is in the CI, 1-alpha
Confidence intervals that are valid no matter how you pick the model
Conditioning on some event is like a truncated normal indexed on one parameter.
fragile because you're comparing two tail probabilities.

Books

The Elements of Statistical Learning

Statistical Inference

Contents

Notes from Statistical Inference conference June 2016

Larry Wasserman

Inference Techniques for p >> N

Coverage

Books

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools