Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)

Abstract

Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versions of the training data and then\ntaking a weighted majority vote of the sequence of classifiers thus produced.\nFor many classification algorithms, this simple strategy results in dramatic\nimprovements in performance. We show that this seemingly mysterious phenomenon\ncan be understood in terms of well-known statistical principles, namely\nadditive modeling and maximum likelihood. For the two-class problem, boosting\ncan be viewed as an approximation to additive modeling on the logistic scale\nusing maximum Bernoulli likelihood as a criterion. We develop more direct\napproximations and show that they exhibit nearly identical results to boosting.\nDirect multiclass generalizations based on multinomial likelihood are derived\nthat exhibit performance comparable to other recently proposed multiclass\ngeneralizations of boosting in most situations, and far superior in some. We\nsuggest a minor modification to boosting that can reduce computation, often by\nfactors of 10 to 50. Finally, we apply these insights to produce an alternative\nformulation of boosting decision trees. This approach, based on best-first\ntruncated tree induction, often leads to better performance, and can provide\ninterpretable descriptions of the aggregate decision rule. It is also much\nfaster computationally, making it more suitable to large-scale data mining\napplications.

Keywords

Boosting (machine learning)Gradient boostingMathematicsDecision treeMachine learningMultinomial logistic regressionArtificial intelligenceBernoulli's principleMultinomial distributionLogistic regressionRandom forestComputer scienceEconometrics

Affiliated Institutions

Related Publications

On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)

Scott Lundberg , Su‐In Lee

In an industrial maintenance context, degradation diagnosis is the problem of determining the current level of degradation of operating machines based on measurements. With the ...

2024 Dagstuhl Research Online Publication ... 12892 citations

Kernel Logistic Regression and the Import Vector Machine

Ji Zhu , Trevor Hastie

The support vector machine (SVM) is known for its good performance in two-class classification, but its extension to multiclass classification is still an ongoing research issue...

2001 136 citations

Accounting for Uncertainty in the Tree Topology Has Little Effect on the Decision-Theoretic Approach to Model Selection in Phylogeny Estimation

Zaid Abdo , Vladimir N. Minin , Paul Joyce +1 more

Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigor...

2004 Molecular Biology and Evolution 86 citations

Explaining heterogeneity in meta-analysis: a comparison of methods

Simon G. Thompson , Stephen J. Sharp

Exploring the possible reasons for heterogeneity between studies is an important aspect of conducting a meta-analysis. This paper compares a number of methods which can be used ...

1999 Statistics in Medicine 1702 citations

jModelTest: Phylogenetic Model Averaging

David Posada

jModelTest is a new program for the statistical selection of models of nucleotide substitution based on "Phyml" (Guindon and Gascuel 2003. A simple, fast, and accurate algorithm...

2008 Molecular Biology and Evolution 10411 citations

Publication Info

Year: 2000
Type: article
Volume: 28
Issue: 2
Citations: 6819
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

6819

OpenAlex

Cite This

APA Style

                            
                                    Jerome H. Friedman, 
                                
                                    Trevor Hastie, 
                                
                                    Robert Tibshirani
                                
                            (2000). 
                            Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). 
                            The Annals of Statistics
                            , 28
                            (2)
                            .
                            https://doi.org/10.1214/aos/1016218223

Identifiers

DOI: 10.1214/aos/1016218223