Prediction Error and Its Estimation for Subset-Selected Models

Ellen B. Roecker

doi:10.2307/1269417

Abstract

Abstract Strategies are compared for development of a linear regression model and the subsequent assessment of its predictive ability. Simulations were performed as a designed experiment over a range of data structures. Approaches using a forward selection of variables resulted in slightly smaller prediction errors and less biased estimators of predictive accuracy than all possible subsets selection but often did not improve on the full model. Random and balanced data splitting resulted in increased prediction errors and estimators with large mean squared error. To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample usually he used for model development and assessment. KEY WORDS: Cross-validationData splittingLinear regressionStochastic predictor variables

Keywords

EstimationStatisticsMathematicsComputer scienceEconomics

Affiliated Institutions

University of Wisconsin–Madison US

Related Publications

Summarizing the predictive power of a generalized linear model

Beiyao Zheng , Alan Agresti

This paper studies summary measures of the predictive power of a generalized linear model, paying special attention to a generalization of the multiple correlation coefficient f...

2000 Statistics in Medicine 272 citations

Introduction to Variance Estimation.

Benjamin F. King , Kirk M. Wolter

The Method of Random Groups.- Variance Estimation Based on Balanced Half-Samples.- The Jackknife Method.- The Bootstrap Method.- Taylor Series Methods.- Generalized Variance Fun...

1987 Journal of the American Statistical A... 1387 citations

The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error

Leo Breiman

Abstract When a regression problem contains many predictor variables, it is rarely wise to try to fit the data by means of a least squares regression on all of the predictor var...

1992 Journal of the American Statistical A... 272 citations

A review of methods for the assessment of prediction errors in conservation presence/absence models

Alan H. Fielding , John F. Bell

Predicting the distribution of endangered species from habitat data is frequently perceived to be a useful technique. Models that predict the presence or absence of a species ar...

1997 Environmental Conservation 6696 citations

Variable Bandwidth Kernel Estimators of Regression Curves

Hans‐Georg Müller , Ulrich Stadtmüller

In the model $Y_i = g(t_i) + \\varepsilon_i,\\quad i = 1,\\cdots, n,$ where $Y_i$ are given observations, $\\varepsilon_i$ i.i.d. noise variables and $t_i$ nonrandom design poin...

1987 The Annals of Statistics 159 citations

Publication Info

Year: 1991
Type: article
Volume: 33
Issue: 4
Pages: 459-459
Citations: 123
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Prediction Error and Its Estimation for Subset-Selected Models

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

123

OpenAlex

Cite This

APA Style

                            
                                    Ellen B. Roecker
                                
                            (1991). 
                            Prediction Error and Its Estimation for Subset-Selected Models. 
                            Technometrics
                            , 33
                            (4)
                            , 459-459.
                            https://doi.org/10.2307/1269417

Identifiers

DOI: 10.2307/1269417