Abstract

The goal of semi-supervised learning (SSL) methods is to reduce the amount of labeled training data required by learning from both labeled and unlabeled instances. Macskassy and Provost (2007) proposed the weighted-vote relational neighbor classifier (wvRN) as a simple yet effective baseline for semi-supervised learning on network data. It is similar to many recent graph-based SSL methods and is shown to be essentially the same as the Gaussian-field harmonic functions classifier proposed by Zhu et al. (2003) and proves to be very effective on some benchmark network datasets. We describe another simple and intuitive semi-supervised learning method based on random graph walk that outperforms wvRN by a large margin on several benchmark datasets when very few labels are available. Additionally, we show that using authoritative instances as training seeds --- instances that arguably cost much less to label --- dramatically reduces the amount of labeled data required to achieve the same classification accuracy. For some existing state-of-the-art semi-supervised learning methods the labeled data needed is reduced by a factor of 50.

Keywords

Computer scienceArtificial intelligenceSemi-supervised learningLabeled dataMachine learningSupervised learningClassifier (UML)Margin (machine learning)GraphBenchmark (surveying)Pattern recognition (psychology)Artificial neural networkTheoretical computer science

Affiliated Institutions

Related Publications

Publication Info

Year
2010
Type
article
Pages
192-199
Citations
93
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

93
OpenAlex

Cite This

Frank Lin, William W. Cohen (2010). Semi-Supervised Classification of Network Data Using Very Few Labels. , 192-199. https://doi.org/10.1109/asonam.2010.19

Identifiers

DOI
10.1109/asonam.2010.19