Abstract
We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework also encompasses methods for cleaning databases containing corrupted data. Both on-line and off-line algorithms are proposed and experimentally checked on databases of handwritten images. The generality of the framework makes it an attractive candidate for new applications in knowledge discovery. Keywords: knowledge discovery, machine learning, informative patterns, data cleaning, information gain. 4.1
Keywords
Affiliated Institutions
Related Publications
LRBM: A Restricted Boltzmann Machine Based Approach for Representation Learning on Linked Data
Linked data consist of both node attributes, e.g., Preferences, posts and degrees, and links which describe the connections between nodes. They have been widely used to represen...
Search and clustering orders of magnitude faster than BLAST
Abstract Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAS...
Using Embeddings to Improve Named Entity Recognition Classification with Graphs
Richer information has potential to improve performance of NLP (Natural Language Processing) tasks such as Named Entity Recognition. A linear sequence of words can be enriched w...
Performance-Based Selection of Likelihood Models for Phylogeny Estimation
Phylogenetic estimation has largely come to rely on explicitly model-based methods. This approach requires that a model be chosen and that that choice be justified. To date, jus...
Publication Info
- Year
- 1996
- Type
- article
- Pages
- 181-203
- Citations
- 234
- Access
- Closed