The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction

David M. Allen

doi:10.1080/00401706.1974.10489157

Abstract

We show that data augmentation provides a rather general formulation for the study of biased prediction techniques using multiple linear regression. Variable selection is a limiting case, and Ridge regression is a special case of data augmentation. We propose a way to obtain predictors given a credible criterion of good prediction.

Keywords

LimitingLinear regressionSelection (genetic algorithm)RegressionFeature selectionVariable (mathematics)RidgeRegression analysisComputer scienceStatisticsMathematicsEconometricsData miningArtificial intelligenceEngineering

Affiliated Institutions

University of Kentucky US

Related Publications

The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error

Leo Breiman

Abstract When a regression problem contains many predictor variables, it is rarely wise to try to fit the data by means of a least squares regression on all of the predictor var...

1992 Journal of the American Statistical A... 272 citations

The Risk Inflation Criterion for Multiple Regression

Dean P. Foster , Edward I. George

A new criterion is proposed for the evaluation of variable selection procedures in multiple regression. This criterion, which we call the risk inflation, is based on an adjustme...

1994 The Annals of Statistics 529 citations

Fast Stable Restricted Maximum Likelihood and Marginal Likelihood Estimation of Semiparametric Generalized Linear Models

Simon N. Wood

Summary Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoot...

2010 Journal of the Royal Statistical Soci... 7047 citations

How Biased is the Apparent Error Rate of a Prediction Rule?

Bradley Efron

Abstract A regression model is fitted to an observed set of data. How accurate is the model for predicting future observations? The apparent error rate tends to underestimate th...

1986 Journal of the American Statistical A... 419 citations

Better Subset Regression Using the Nonnegative Garrote

Leo Breiman

A new method, called the nonnegative (nn) garrote, is proposed for doing subset regression. It both shrinks and zeroes coefficients. In tests on real and simulated data, it prod...

1995 Technometrics 785 citations

Publication Info

Year: 1974
Type: article
Volume: 16
Issue: 1
Pages: 125-127
Citations: 1349
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1349

OpenAlex

Cite This

APA Style

                            
                                    David M. Allen
                                
                            (1974). 
                            The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction. 
                            Technometrics
                            , 16
                            (1)
                            , 125-127.
                            https://doi.org/10.1080/00401706.1974.10489157

Identifiers

DOI: 10.1080/00401706.1974.10489157