Abstract

In this paper we derive a method for evaluating and improving techniques for selecting informative genes from microarray data. Genes of interest are typically selected by ranking genes according to a test-statistic and then choosing the top k genes. A problem with this approach is that many of these genes are highly correlated. For classification purposes it would be ideal to have distinct but still highly informative genes. We propose three different pre-filter methods--two based on clustering and one based on correlation--to retrieve groups of similar genes. For these groups we apply a test-statistic to finally select genes of interest. We show that this filtered set of genes can be used to significantly improve existing classifiers.

Keywords

Gene selectionRanking (information retrieval)Cluster analysisSelection (genetic algorithm)GeneComputer scienceTest statisticStatisticDNA microarrayArtificial intelligenceBiologyComputational biologyMachine learningStatistical hypothesis testingMathematicsGeneticsMicroarray analysis techniquesStatistics

Affiliated Institutions

Related Publications

Publication Info

Year
2002
Type
article
Pages
53-64
Citations
203
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

203
OpenAlex

Cite This

Jochen Jaeger, R. SENGUPTA, Walter L. Ruzzo (2002). IMPROVED GENE SELECTION FOR CLASSIFICATION OF MICROARRAYS. , 53-64. https://doi.org/10.1142/9789812776303_0006

Identifiers

DOI
10.1142/9789812776303_0006