Abstract
In this paper we derive a method for evaluating and improving techniques for selecting informative genes from microarray data. Genes of interest are typically selected by ranking genes according to a test-statistic and then choosing the top k genes. A problem with this approach is that many of these genes are highly correlated. For classification purposes it would be ideal to have distinct but still highly informative genes. We propose three different pre-filter methods--two based on clustering and one based on correlation--to retrieve groups of similar genes. For these groups we apply a test-statistic to finally select genes of interest. We show that this filtered set of genes can be used to significantly improve existing classifiers.
Keywords
Affiliated Institutions
Related Publications
Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer
Epithelial ovarian cancer is the leading cause of death from gynecologic cancer, in part because of the lack of effective early detection methods. Although alterations of severa...
How Many Genes Are Needed for a Discriminant Microarray Data Analysis ?
The analysis of the leukemia data from Whitehead/MIT group is a discriminant analysis (also called a supervised learning). Among thousands of genes whose expression levels are m...
RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays
Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription fa...
Significance analysis of microarrays applied to the ionizing radiation response
Microarrays can measure the expression of thousands of genes to identify changes in expression between different biological states. Methods are needed to determine the significa...
Minimum redundancy feature selection from microarray gene expression data
Selecting a small subset of genes out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank gen...
Publication Info
- Year
- 2002
- Type
- article
- Pages
- 53-64
- Citations
- 203
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1142/9789812776303_0006