Set-oriented mining for association rules in relational databases

Abstract

Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss the optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. SETM uses only simple database primitives, viz. sorting and merge-scan join. SETM is simple, fast and stable over the range of parameter values. The major contribution of this paper is that it shows that at least some aspects of data mining can be carried out by using general query languages such as SQL, rather than by developing specialized black-box algorithms. The set-oriented nature of SETM facilitates the development of extensions

Keywords

Computer scienceJoinsSQLRelational databaseMerge (version control)Association rule learningSet (abstract data type)Data miningResult setSortingDatabaseTheoretical computer scienceInformation retrievalAlgorithmProgramming language

Affiliated Institutions

Related Publications

New Powder Diffraction File (PDF-4) in relational database format: advantages and data-mining capabilities

S. Kabekkodu , J. Faber , Tim Fawcett

The International Centre for Diffraction Data (ICDD) is responding to the changing needs in powder diffraction and materials analysis by developing the Powder Diffraction File (...

2002 Acta Crystallographica Section B Stru... 99 citations

Part-Based Statistical Models for Object Classification and Detection

Elliot Joel Bernstein , Yali Amit

We propose using simple mixture models to define a set of mid-level binary local features based on binary oriented edge input. The features capture natural local structures in t...

2005 31 citations

UMAP: Uniform Manifold Approximation and Projection

Leland McInnes , John Healy , Nathaniel Saul +1 more

Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear di...

2018 The Journal of Open Source Software 8474 citations

GAME: detecting <i>cis</i>-regulatory elements using a genetic algorithm

Zhi Wei , Shane T. Jensen

Abstract Motivation: Identification of a transcription factor binding sites is an important aspect of the analysis of genetic regulation. Many programs have been developed for t...

2006 Bioinformatics 97 citations

Item-based top-<i>N</i>recommendation algorithms

Mukund Deshpande , George Karypis

The explosive growth of the world-wide-web and the emergence of e-commerce has led to the development of recommender systems ---a personalized information filtering technology u...

2004 ACM Transactions on Information Systems 2164 citations

Publication Info

Year: 2002
Type: article
Pages: 25-33
Citations: 270
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Set-oriented mining for association rules in relational databases

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

270

OpenAlex

Cite This

APA Style

                            
                                    M.A.W. Houtsma, 
                                
                                    A. Swami
                                
                            (2002). 
                            Set-oriented mining for association rules in relational databases. 
                            
                            , 25-33.
                            https://doi.org/10.1109/icde.1995.380413

Identifiers

DOI: 10.1109/icde.1995.380413