Abstract
Abstract Strategies are compared for development of a linear regression model and the subsequent assessment of its predictive ability. Simulations were performed as a designed experiment over a range of data structures. Approaches using a forward selection of variables resulted in slightly smaller prediction errors and less biased estimators of predictive accuracy than all possible subsets selection but often did not improve on the full model. Random and balanced data splitting resulted in increased prediction errors and estimators with large mean squared error. To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample usually he used for model development and assessment. KEY WORDS: Cross-validationData splittingLinear regressionStochastic predictor variables
Keywords
Affiliated Institutions
Related Publications
Summarizing the predictive power of a generalized linear model
This paper studies summary measures of the predictive power of a generalized linear model, paying special attention to a generalization of the multiple correlation coefficient f...
Introduction to Variance Estimation.
The Method of Random Groups.- Variance Estimation Based on Balanced Half-Samples.- The Jackknife Method.- The Bootstrap Method.- Taylor Series Methods.- Generalized Variance Fun...
The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error
Abstract When a regression problem contains many predictor variables, it is rarely wise to try to fit the data by means of a least squares regression on all of the predictor var...
A review of methods for the assessment of prediction errors in conservation presence/absence models
Predicting the distribution of endangered species from habitat data is frequently perceived to be a useful technique. Models that predict the presence or absence of a species ar...
Variable Bandwidth Kernel Estimators of Regression Curves
In the model $Y_i = g(t_i) + \\varepsilon_i,\\quad i = 1,\\cdots, n,$ where $Y_i$ are given observations, $\\varepsilon_i$ i.i.d. noise variables and $t_i$ nonrandom design poin...
Publication Info
- Year
- 1991
- Type
- article
- Volume
- 33
- Issue
- 4
- Pages
- 459-459
- Citations
- 123
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.2307/1269417