Abstract
Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. Implicit in many of these techniques is the concept of randomization that generates different classifiers. In this paper, they focus on ensembles of decision trees that are created using a randomized procedure based on histograms. Techniques, such as histograms, that discretize continuous variables, have long been used in classification to convert the data into a form suitable for processing and to reduce the compute time. The approach combines the ideas behind discretization through histograms and randomization in ensembles to create decision trees by randomly selecting a split point in an interval around the best bin boundary in the histogram. The experimental results with public domain data show that ensembles generated using this approach are competitive in accuracy and superior in computational cost to other ensembles techniques such as boosting and bagging.
Keywords
Related Publications
Bagging, boosting, and C4.S
Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classifier learning systems. Both form a set of classifiers that ar...
A Communication-Efficient Parallel Algorithm for Decision Tree
Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and...
Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)
Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versi...
Analyzing bagging
Bagging is one of the most effective computationally intensive procedures to improve on unstable estimators or classifiers, useful especially for high dimensional data set probl...
Parallel boosted regression trees for web search ranking
Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned web-search ranking - a domain notorious for very large data sets....
Publication Info
- Year
- 2002
- Type
- article
- Pages
- 370-383
- Citations
- 6
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1137/1.9781611972726.22