Approximate Splitting for Ensembles of Trees using Histograms

Abstract

Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. Implicit in many of these techniques is the concept of randomization that generates different classifiers. In this paper, they focus on ensembles of decision trees that are created using a randomized procedure based on histograms. Techniques, such as histograms, that discretize continuous variables, have long been used in classification to convert the data into a form suitable for processing and to reduce the compute time. The approach combines the ideas behind discretization through histograms and randomization in ensembles to create decision trees by randomly selecting a split point in an interval around the best bin boundary in the histogram. The experimental results with public domain data show that ensembles generated using this approach are competitive in accuracy and superior in computational cost to other ensembles techniques such as boosting and bagging.

Keywords

Boosting (machine learning)HistogramDecision treeComputer scienceDiscretizationArtificial intelligenceBinMachine learningDecision boundaryPattern recognition (psychology)Domain (mathematical analysis)Class (philosophy)Data miningMathematicsAlgorithmClassifier (UML)

Related Publications

Bagging, boosting, and C4.S

J. R. Quinlan

Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classifier learning systems. Both form a set of classifiers that ar...

1996 National Conference on Artificial Int... 1262 citations

A Communication-Efficient Parallel Algorithm for Decision Tree

Qi Meng , Guolin Ke , Taifeng Wang +4 more

Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and...

2016 arXiv (Cornell University) 69 citations

Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)

Jerome H. Friedman , Trevor Hastie , Robert Tibshirani

Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versi...

2000 The Annals of Statistics 6819 citations

Analyzing bagging

Peter Bühlmann , Bin Yu

Bagging is one of the most effective computationally intensive procedures to improve on unstable estimators or classifiers, useful especially for high dimensional data set probl...

2002 The Annals of Statistics 564 citations

Parallel boosted regression trees for web search ranking

Stephen Tyree , Kilian Q. Weinberger , Kunal Agrawal +1 more

Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned web-search ranking - a domain notorious for very large data sets....

2011 164 citations

Publication Info

Year: 2002
Type: article
Pages: 370-383
Citations: 6
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Approximate Splitting for Ensembles of Trees using Histograms

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Chandrika Kamath, 
                                
                                    Erick Cantú‐Paz, 
                                
                                    David Littau
                                
                            (2002). 
                            Approximate Splitting for Ensembles of Trees using Histograms. 
                            
                            , 370-383.
                            https://doi.org/10.1137/1.9781611972726.22

Identifiers

DOI: 10.1137/1.9781611972726.22