Abstract

Summary Variable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In this article, a new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters. In addition to improving prediction accuracy and interpretation, these resulting groups can then be investigated further to discover what contributes to the group having a similar behavior. The technique is based on penalized least squares with a geometrically intuitive penalty function that shrinks some coefficients to exactly zero. Additionally, this penalty yields exact equality of some coefficients, encouraging correlated predictors that have a similar effect on the response to form predictive clusters represented by a single coefficient. The proposed procedure is shown to compare favorably to the existing shrinkage and variable selection techniques in terms of both prediction error and model complexity, while yielding the additional grouping information.

Keywords

Feature selectionCluster analysisShrinkageRegressionSelection (genetic algorithm)Variable (mathematics)MathematicsElastic net regularizationRegression analysisLinear regressionFunction (biology)Computer scienceStatisticsPenalty methodPattern recognition (psychology)AlgorithmArtificial intelligenceMathematical optimization

Affiliated Institutions

Related Publications

Publication Info

Year
2007
Type
article
Volume
64
Issue
1
Pages
115-123
Citations
456
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

456
OpenAlex

Cite This

Howard D. Bondell, Brian J. Reich (2007). Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR. Biometrics , 64 (1) , 115-123. https://doi.org/10.1111/j.1541-0420.2007.00843.x

Identifiers

DOI
10.1111/j.1541-0420.2007.00843.x