Choosing software metrics for defect prediction: an investigation on feature selection techniques

Kehan Gao; Taghi M. Khoshgoftaar; Huanjing Wang; Naeem Seliya

doi:10.1002/spe.1043

Abstract

Abstract The selection of software metrics for building software quality prediction models is a search‐based software engineering problem. An exhaustive search for such metrics is usually not feasible due to limited project resources, especially if the number of available metrics is large. Defect prediction models are necessary in aiding project managers for better utilizing valuable project resources for software quality improvement. The efficacy and usefulness of a fault‐proneness prediction model is only as good as the quality of the software measurement data. This study focuses on the problem of attribute selection in the context of software quality estimation. A comparative investigation is presented for evaluating our proposed hybrid attribute selection approach, in which feature ranking is first used to reduce the search space, followed by a feature subset selection. A total of seven different feature ranking techniques are evaluated, while four different feature subset selection approaches are considered. The models are trained using five commonly used classification algorithms. The case study is based on software metrics and defect data collected from multiple releases of a large real‐world software system. The results demonstrate that while some feature ranking techniques performed similarly, the automatic hybrid search algorithm performed the best among the feature subset selection methods. Moreover, performances of the defect prediction models either improved or remained unchanged when over 85were eliminated. Copyright © 2011 John Wiley & Sons, Ltd.

Keywords

Feature selectionData miningRanking (information retrieval)Computer scienceSoftware metricSoftwareFeature (linguistics)Selection (genetic algorithm)Context (archaeology)Machine learningSoftware qualitySoftware bugQuality (philosophy)Artificial intelligencePredictive modellingSoftware development

Affiliated Institutions

Related Publications

Feature selection for high-dimensional genomic microarray data

Eric P. Xing , Michael I. Jordan , Richard M. Karp

We report on the successful application of feature selection methods to a classification problem in molecular biology involving only 72 data points in a 7130 dimensional space. ...

2001 628 citations

Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thalianasequences

Nathalie Pavy , Stéphane Rombauts , Patrice Déhais +4 more

Abstract Motivation: The annotation of the Arabidopsis thalianagenome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the mo...

1999 Bioinformatics 107 citations

Feature selection: evaluation, application, and small sample performance

Anil K. Jain , Douglas E. Zongker

A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection algorithm, proposed by...

1997 IEEE Transactions on Pattern Analysis... 2147 citations

Multiple Aspect Ranking Using the Good Grief Algorithm

Benjamin Snyder , Regina Barzilay

We address the problem of analyzing multiple related opinions in a text. For instance, in a restaurant review such opinions may include food, ambience and service. We formulate ...

2007 316 citations

A Comparison of Selection Schemes Used in Evolutionary Algorithms

Tobias Blickle , Lothar Thiele

Evolutionary algorithms are a common probabilistic optimization method based on the model of natural evolution. One important operator in these algorithms is the selection schem...

1996 Evolutionary Computation 574 citations

Publication Info

Year: 2011
Type: article
Volume: 41
Issue: 5
Pages: 579-606
Citations: 281
Access: Closed

External Links

Download PDF (Free) View on DOI.org Semantic Scholar

Social Impact

Altmetric

Choosing software metrics for defect prediction: an investigation on feature selection techniques

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

281

OpenAlex

Influential

210

CrossRef

Cite This

APA Style

                            
                                
                                    Kehan Gao, 
                                
                                    Taghi M. Khoshgoftaar, 
                                
                                    Huanjing Wang
                                
                                et al.
                            
                            (2011). 
                            Choosing software metrics for defect prediction: an investigation on feature selection techniques. 
                            Software Practice and Experience
                            , 41
                            (5)
                            , 579-606.
                            https://doi.org/10.1002/spe.1043
                        

Identifiers

DOI: 10.1002/spe.1043

Data Quality

Data completeness: 86%