Abstract

Selecting a small subset of genes out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. Feature sets obtained through the minimum redundancy - maximum relevance framework represent broader spectrum of characteristics of phenotypes than those obtained through standard ranking methods; they are more robust, generalize well to unseen data, and lead to significantly improved classifications in extensive experiments on 5 gene expressions data sets.

Keywords

Redundancy (engineering)Minimum redundancy feature selectionFeature selectionMicroarray analysis techniquesComputer scienceData miningGenePhenotypePattern recognition (psychology)Ranking (information retrieval)Artificial intelligenceFeature (linguistics)Computational biologyData redundancyBiologyGeneticsGene expression

Affiliated Institutions

Related Publications

Publication Info

Year
2004
Type
article
Pages
523-528
Citations
496
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

496
OpenAlex

Cite This

C. Ding, Hujin Peng (2004). Minimum redundancy feature selection from microarray gene expression data. , 523-528. https://doi.org/10.1109/csb.2003.1227396

Identifiers

DOI
10.1109/csb.2003.1227396