A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION

Abstract

The Lasso, the Forward Stagewise regression and the Lars are closely re-lated procedures recently proposed for linear regression problems. Each of them can produce sparse models and can be used both for estimation and variable selection. In practical implementations these algorithms are typically tuned to achieve optimal prediction accuracy. We show that, when the predic-tion accuracy is used as the criterion to choose the tuning parameter, in general these procedures are not consistent in terms of variable selection. That is, the sets of variables selected are not consistent at finding the true set of important variables. In particular, we show that for any sample size n, when there are superfluous variables in the linear regression model and the design matrix is orthogonal, the probability of the procedures correctly identifying the true set of important variables is less than a constant (smaller than one) not depending on n. This result is also shown to hold for two dimensional problems with gen-eral correlated design matrices. The results indicate that in problems where

Keywords

Lasso (programming language)Design matrixFeature selectionLinear regressionVariable (mathematics)MathematicsSet (abstract data type)Selection (genetic algorithm)Regression analysisRegressionLinear modelModel selectionElastic net regularizationComputer scienceStatisticsMathematical optimizationArtificial intelligence

Affiliated Institutions

Related Publications

Least angle regression

Bradley Efron , Trevor Hastie , Iain M. Johnstone +1 more

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to whi...

2004 The Annals of Statistics 9329 citations

Variable selection – A review and recommendations for the practicing statistician

Georg Heinze , Christine Wallisch , Daniela Dunkler

Abstract Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk...

2018 Biometrical Journal 1460 citations

Model Selection and Estimation in Regression with Grouped Variables

Ming Yuan , Yi Lin

Summary We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with...

2005 Journal of the Royal Statistical Soci... 7270 citations

The Adaptive Lasso and Its Oracle Properties

Hui Zou

The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this w...

2006 Journal of the American Statistical A... 7303 citations

The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error

Leo Breiman

Abstract When a regression problem contains many predictor variables, it is rarely wise to try to fit the data by means of a least squares regression on all of the predictor var...

1992 Journal of the American Statistical A... 272 citations

Publication Info

Year: 2006
Type: article
Citations: 255
Access: Closed

External Links

Citation Metrics

255

OpenAlex

Cite This

APA Style

                            
                                    Chenlei Leng, 
                                
                                    Yi Lin, 
                                
                                    Grace Wahba
                                
                            (2006). 
                            A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION. 
                            Scopus
                            
                            .