Quickly Boosting Decision Trees - Pruning Underachieving Features Early

Abstract

Boosted decision trees are one of the most popular and successful learning techniques used today. While exhibiting fast speeds at test time, relatively slow training makes them impractical for applications with real-time learning requirements. We propose a principled approach to overcome this drawback. We prove a bound on the error of a decision stump given its preliminary error on a subset of the training data; the bound may be used to prune unpromising features early on in the training process. We propose a fast training algorithm that exploits this bound, yielding speedups of an order of magnitude at no cost in the final performance of the classifier. Our method is not a new variant of Boosting; rather, it may be used in conjunction with existing Boosting algorithms and other sampling heuristics to achieve even greater speedups.

Keywords

Boosting (machine learning)Computer scienceExploitMachine learningArtificial intelligenceDecision treeGradient boostingPruningClassifier (UML)Training setRandom forest

Affiliated Institutions

Related Publications

Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)

Jerome H. Friedman , Trevor Hastie , Robert Tibshirani

Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versi...

2000 The Annals of Statistics 6819 citations

Part-Based Statistical Models for Object Classification and Detection

Elliot Joel Bernstein , Yali Amit

We propose using simple mixture models to define a set of mid-level binary local features based on binary oriented edge input. The features capture natural local structures in t...

2005 31 citations

Statistical pattern recognition: a review

Anil K. Jain , Peter Duin , Jianchang Mao

The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated...

2000 IEEE Transactions on Pattern Analysis... 6667 citations

Fast and accurate image upscaling with super-resolution forests

Samuel Schulter , Christian Leistner , Horst Bischof

The aim of single image super-resolution is to reconstruct a high-resolution image from a single low-resolution input. Although the task is ill-posed it can be seen as finding a...

2015 657 citations

Path Aggregation Network for Instance Segmentation

Shu Liu , Lu Qi , Haifang Qin +2 more

The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in ...

2018 2018 IEEE/CVF Conference on Computer ... 7956 citations

Publication Info

Year: 2013
Type: article
Pages: 594-602
Citations: 123
Access: Closed

External Links

Citation Metrics

123

OpenAlex

Cite This

APA Style

                            
                                    Ron D. Appel, 
                                
                                    Thomas J. Fuchs, 
                                
                                    Piotr Dollár
                                
                                et al.
                            
                            (2013). 
                            Quickly Boosting Decision Trees - Pruning Underachieving Features Early. 
                            The Caltech Institute Archives (California Institute of Technology)
                            
                            , 594-602.