Abstract
AbstractAdvances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene function, but they also present the challenge of analyzing data with a large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. We address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression, based on a previous approach, iteratively reweighted partial least squares, that is, IRWPLS. We compare our results with two-stage PLS and with other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying Firth's procedure to avoid (quasi)separation, we often get lower classification error rates.Key Words: Cross-validationFirth's procedureGene expressionIteratively reweighted partial least squares(Quasi)separationTwo-stage PLS
Keywords
Affiliated Institutions
Related Publications
PLS-SEM: Indeed a Silver Bullet
Structural equation modeling (SEM) has become a quasi-standard in marketing and management research when it comes to analyzing the cause-effect relations between latent construc...
Partial least squares regression and projection on latent structure regression (PLS Regression)
Abstract Partial least squares (PLS) regression ( a.k.a. projection on latent structures) is a recent technique that combines features from and generalizes principal component a...
Discovering Unobserved Heterogeneity in Structural Equation Models to Avert Validity Threats1
A large proportion of information systems research is concerned with developing and testing models pertaining to complex cognition, behaviors, and outcomes of individuals, teams...
Asymptotic Theory for ARCH Models: Estimation and Testing
In the context of a linear dynamic model with moving average errors, we consider a heteroscedastic model which represents an extension of the ARCH model introduced by Engle [4]....
Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR
Summary Variable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In thi...
Publication Info
- Year
- 2005
- Type
- article
- Volume
- 14
- Issue
- 2
- Pages
- 280-298
- Citations
- 98
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1198/106186005x47697