Abstract
WHIRL is an extension of relational databases that can perform “soft joins ” based on the similarity of textual identifiers; these soft joins extend the traditional operation of joining tables based on the equivalence of atomic values. This paper evaluates WHIRL on a number of inductive classification tasks using data from the World Wide Web. We show that although WHIRL is designed for more general similaritybased reasoning tasks, it is competitive with mature inductive classification systems on these classification tasks. In particular, WHIRL generally achieves lower generalization error than C4.5, RIPPER, and several nearest-neighbor methods. WHIRL is also fast-p to 500 times faster than C4.5 on some benchmark problems. We also show that WHIRL can be efficiently used to select from a large pool of unlabeled items those that can be classified correctly with high confidence.
Keywords
Affiliated Institutions
Related Publications
Supervised Multimodal Bitransformers for Classifying Images and Text
Self-supervised bidirectional transformer models such as BERT have led to dramatic improvements in a wide variety of textual classification tasks. The modern digital world is in...
The multiscale classifier
Proposes a rule-based inductive learning algorithm called multiscale classification (MSC). It can be applied to any N-dimensional real or binary classification problem to classi...
Solving Multiclass Learning Problems via Error-Correcting Output Codes
Multiclass learning problems involve finding a definitionfor an unknown function f(x) whose range is a discrete setcontaining k > 2 values (i.e., k ``classes''). Thedefinitio...
*SEM 2013 shared task: Semantic Textual Similarity
In Semantic Textual Similarity (STS), systems rate the degree of semantic equivalence, on a graded scale from 0 to 5, with 5 being the most similar. This year we set up two task...
Supervised classification of human microbiota
Recent advances in DNA sequencing technology have allowed the collection of high-dimensional data from human-associated microbial communities on an unprecedented scale. A major ...
Publication Info
- Year
- 1998
- Type
- article
- Pages
- 169-173
- Citations
- 105
- Access
- Closed