Abstract

KDD Cup 2010 is an educational data mining competition. Participants are asked to learn a model from students ’ past behavior and then predict their future performance. At National Taiwan University, we organized a course for this competition. Most student sub-teams expanded features by various binarization and discretization techniques. The resulting sparse feature sets were trained by logistic regression (using LIBLINEAR). One sub-team considered condensed features using simple statistical techniques and applied Random Forest (through Weka) for training. Initial development was conducted on an internal split of training data for training and validation. We identified some useful feature combinations to improve performance. For the final submission, we combined results of student sub-teams by regularized linear regression. Our team is the first prize winner of both tracks (all teams and student teams) of KDD Cup 2010.

Keywords

Random forestArtificial intelligenceComputer scienceFeature engineeringMachine learningFeature (linguistics)Classifier (UML)Logistic regressionDiscretizationData miningPattern recognition (psychology)Deep learningMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2010
Type
article
Citations
128
Access
Closed

External Links

Citation Metrics

128
OpenAlex

Cite This

Hsiang‐Fu Yu, Hung-Yi Lo, Hsun-Ping Hsieh et al. (2010). Feature Engineering and Classifier Ensemble for KDD Cup 2010. .