Abstract
Summary Variable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In this article, a new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters. In addition to improving prediction accuracy and interpretation, these resulting groups can then be investigated further to discover what contributes to the group having a similar behavior. The technique is based on penalized least squares with a geometrically intuitive penalty function that shrinks some coefficients to exactly zero. Additionally, this penalty yields exact equality of some coefficients, encouraging correlated predictors that have a similar effect on the response to form predictive clusters represented by a single coefficient. The proposed procedure is shown to compare favorably to the existing shrinkage and variable selection techniques in terms of both prediction error and model complexity, while yielding the additional grouping information.
Keywords
Affiliated Institutions
Related Publications
Collinearity: a review of methods to deal with it and a simulation study evaluating their performance
Collinearity refers to the non independence of predictor variables, usually in a regression‐type analysis. It is a common feature of any descriptive ecological data set and can ...
Regression Shrinkage and Selection Via the Lasso
SUMMARY We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients b...
How Many Genes Are Needed for a Discriminant Microarray Data Analysis ?
The analysis of the leukemia data from Whitehead/MIT group is a discriminant analysis (also called a supervised learning). Among thousands of genes whose expression levels are m...
Stability-Based Validation of Clustering Solutions
Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract “natural” group structure in data. Such groupings need to be validated ...
Partial least squares regression and projection on latent structure regression (PLS Regression)
Abstract Partial least squares (PLS) regression ( a.k.a. projection on latent structures) is a recent technique that combines features from and generalizes principal component a...
Publication Info
- Year
- 2007
- Type
- article
- Volume
- 64
- Issue
- 1
- Pages
- 115-123
- Citations
- 456
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1111/j.1541-0420.2007.00843.x