Abstract

Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

Keywords

Feature selectionMutual informationPattern recognition (psychology)Redundancy (engineering)Minimum redundancy feature selectionArtificial intelligenceComputer scienceSupport vector machineDependency (UML)Naive Bayes classifierFeature (linguistics)Data miningMachine learning

Affiliated Institutions

Related Publications

Publication Info

Year
2005
Type
article
Volume
27
Issue
8
Pages
1226-1238
Citations
10050
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

10050
OpenAlex

Cite This

Hanchuan Peng, Fuhui Long, Chen Ding (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence , 27 (8) , 1226-1238. https://doi.org/10.1109/tpami.2005.159

Identifiers

DOI
10.1109/tpami.2005.159