Abstract

Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versions of the training data and then\ntaking a weighted majority vote of the sequence of classifiers thus produced.\nFor many classification algorithms, this simple strategy results in dramatic\nimprovements in performance. We show that this seemingly mysterious phenomenon\ncan be understood in terms of well-known statistical principles, namely\nadditive modeling and maximum likelihood. For the two-class problem, boosting\ncan be viewed as an approximation to additive modeling on the logistic scale\nusing maximum Bernoulli likelihood as a criterion. We develop more direct\napproximations and show that they exhibit nearly identical results to boosting.\nDirect multiclass generalizations based on multinomial likelihood are derived\nthat exhibit performance comparable to other recently proposed multiclass\ngeneralizations of boosting in most situations, and far superior in some. We\nsuggest a minor modification to boosting that can reduce computation, often by\nfactors of 10 to 50. Finally, we apply these insights to produce an alternative\nformulation of boosting decision trees. This approach, based on best-first\ntruncated tree induction, often leads to better performance, and can provide\ninterpretable descriptions of the aggregate decision rule. It is also much\nfaster computationally, making it more suitable to large-scale data mining\napplications.

Keywords

Boosting (machine learning)Gradient boostingMathematicsDecision treeMachine learningMultinomial logistic regressionArtificial intelligenceBernoulli's principleMultinomial distributionLogistic regressionRandom forestComputer scienceEconometrics

Affiliated Institutions

Related Publications

jModelTest: Phylogenetic Model Averaging

jModelTest is a new program for the statistical selection of models of nucleotide substitution based on "Phyml" (Guindon and Gascuel 2003. A simple, fast, and accurate algorithm...

2008 Molecular Biology and Evolution 10411 citations

Publication Info

Year
2000
Type
article
Volume
28
Issue
2
Citations
6819
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

6819
OpenAlex

Cite This

Jerome H. Friedman, Trevor Hastie, Robert Tibshirani (2000). Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). The Annals of Statistics , 28 (2) . https://doi.org/10.1214/aos/1016218223

Identifiers

DOI
10.1214/aos/1016218223