Model Selection and Estimation in Regression with Grouped Variables

2005 Journal of the Royal Statistical Society Series B (Statistical Methodology) 7,270 citations

Abstract

Summary We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS algorithm and the non-negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods.

Keywords

Lasso (programming language)Selection (genetic algorithm)Stepwise regressionFeature selectionVariance (accounting)Computer scienceRegressionRegression analysisFocus (optics)EstimationMachine learningArtificial intelligenceStatisticsMathematicsEngineering

Affiliated Institutions

Related Publications

Subset Selection in Regression

OBJECTIVES Prediction, Explanation, Elimination or What? How Many Variables in the Prediction Formula? Alternatives to Using Subsets 'Black Box' Use of Best-Subsets Techniques L...

2003 Technometrics 1482 citations

Publication Info

Year
2005
Type
article
Volume
68
Issue
1
Pages
49-67
Citations
7270
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

7270
OpenAlex

Cite This

Ming Yuan, Yi Lin (2005). Model Selection and Estimation in Regression with Grouped Variables. Journal of the Royal Statistical Society Series B (Statistical Methodology) , 68 (1) , 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x

Identifiers

DOI
10.1111/j.1467-9868.2005.00532.x