Abstract

Classification for very large datasets has many practical applications in data mining. Techniques such as discretization and dataset sampling can be used to scale up decision tree classifiers to large datasets. Unfortunately, both of these techniques can cause a significant loss in accuracy. We present a novel decision tree classifier called CLOUDS, which samples the splitting points for numeric attributes followed by an estimation step to narrow the search space of the best split. CLOUDS reduces computation and I/O complexity substantially compared to state of the art classifiers, while maintaining the quality of the generated trees in terms of accuracy and tree size. We provide experimental results with a number of real and synthetic datasets.

Keywords

Decision treeComputer scienceClassifier (UML)Decision tree learningComputationData miningIncremental decision treeArtificial intelligenceTree (set theory)Machine learningPattern recognition (psychology)Logistic model treeAlgorithmMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
1998
Type
article
Pages
2-8
Citations
148
Access
Closed

External Links

Citation Metrics

148
OpenAlex

Cite This

Khaled Alsabti, Sanjay Ranka, Vineet Kumar Singh (1998). CLOUDS: a decision tree classifier for large datasets. Syracuse University Libraries (Syracuse University) , 2-8.