Discovering informative patterns and data cleaning

Abstract

We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework also encompasses methods for cleaning databases containing corrupted data. Both on-line and off-line algorithms are proposed and experimentally checked on databases of handwritten images. The generality of the framework makes it an attractive candidate for new applications in knowledge discovery. Keywords: knowledge discovery, machine learning, informative patterns, data cleaning, information gain. 4.1

Keywords

GeneralityComputer scienceData miningData modelingLine (geometry)Information retrievalArtificial intelligenceDatabase

Affiliated Institutions

Related Publications

LRBM: A Restricted Boltzmann Machine Based Approach for Representation Learning on Linked Data

Kang Li , Jing Gao , Suxin Guo +3 more

Linked data consist of both node attributes, e.g., Preferences, posts and degrees, and links which describe the connections between nodes. They have been widely used to represen...

2014 27 citations

Search and clustering orders of magnitude faster than BLAST

R. C. Edgar

Abstract Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAS...

2010 Bioinformatics 20899 citations

Using Embeddings to Improve Named Entity Recognition Classification with Graphs

William L. Hamilton , Rex Ying , Jure Leskovec

Richer information has potential to improve performance of NLP (Natural Language Processing) tasks such as Named Entity Recognition. A linear sequence of words can be enriched w...

2024 Leibniz-Zentrum für Informatik (Schlo... 5323 citations

Performance-Based Selection of Likelihood Models for Phylogeny Estimation

Vladimir N. Minin , Zaid Abdo , Paul Joyce +1 more

Phylogenetic estimation has largely come to rely on explicitly model-based methods. This approach requires that a model be chosen and that that choice be justified. To date, jus...

2003 Systematic Biology 423 citations

Publication Info

Year: 1996
Type: article
Pages: 181-203
Citations: 234
Access: Closed

External Links

Citation Metrics

234

OpenAlex

Cite This

APA Style

                            
                                    Isabelle Guyon, 
                                
                                    Nada Matic, 
                                
                                    Vladimir Vapnik
                                
                            (1996). 
                            Discovering informative patterns and data cleaning. 
                            
                            , 181-203.