Abstract

Abstract Model selection and inference are usually treated as separate stages of regression analysis, even though both tasks are performed on the same set of data. Once a model has been selected, one typically proceeds as though one has a fresh data set generated by the selected model. Here, we present Monte Carlo results on the coverage rates of confidence regions for the regression parameters, conditional on the selected model order. The conditional coverage rates are much smaller than the nominal coverage rates, obtained by assuming that the model was known in advance. Furthermore, the overall coverage rate is much smaller than the nominal value. A possible remedy based on data splitting is suggested.

Keywords

InferenceStatisticsModel selectionLinear regressionData setRegression analysisSelection (genetic algorithm)EconometricsRegressionConfidence intervalMathematicsSet (abstract data type)Monte Carlo methodComputer scienceArtificial intelligence

Affiliated Institutions

Related Publications

Bootstrap Methods: Another Look at the Jackknife

We discuss the following problem: given a random sample $\\mathbf{X} = (X_1, X_2, \\cdots, X_n)$ from an unknown probability distribution $F$, estimate the sampling distribution...

1979 The Annals of Statistics 16966 citations

Publication Info

Year
1990
Type
article
Volume
44
Issue
3
Pages
214-217
Citations
276
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

276
OpenAlex

Cite This

Clifford M. Hurvich, Chih‐Ling Tsai (1990). The Impact of Model Selection on Inference in Linear Regression. The American Statistician , 44 (3) , 214-217. https://doi.org/10.1080/00031305.1990.10475722

Identifiers

DOI
10.1080/00031305.1990.10475722