Mining frequent patterns without candidate generation

Abstract

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.

Keywords

Computer scienceData miningScalabilitySet (abstract data type)Tree (set theory)GSP AlgorithmTrieApriori algorithmAssociation rule learningDatabase transactionData structureDatabaseMathematics

Affiliated Institutions

Simon Fraser University CA

Related Publications

An effective hash-based algorithm for mining association rules

Jong Soo Park , Ming-Syan Chen⋆ , Philip S. Yu

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the pro...

1995 1410 citations

Efficient mining of partial periodic patterns in time series database

Jiawei Han , Guozhu Dong , Yiwen Yin

Partial periodicity search, i.e., search for partial periodic patterns in time-series databases, is an interesting data mining problem. Previous studies on periodicity search ma...

1999 581 citations

Discovering informative patterns and data cleaning

Isabelle Guyon , Nada Matic , Vladimir Vapnik

We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework als...

1996 234 citations

Efficiently mining long patterns from databases

Roberto J. Bayardo

We present a pattern-mining algorithm that scales roughly linearly in the number of maximal patterns embedded in a database irrespective of the length of the longest pattern. In...

1998 1297 citations

Introduction to Data Mining

Pang‐Ning Tan , Michael M. Steinbach , Vipin Kumar

1 Introduction 1.1 What is Data Mining? 1.2 Motivating Challenges 1.3 The Origins of Data Mining 1.4 Data Mining Tasks 1.5 Scope and Organization of the Book 1.6 Bibliog...

2008 7026 citations

Publication Info

Year: 2000
Type: article
Volume: 29
Issue: 2
Pages: 1-12
Citations: 6285
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Mining frequent patterns without candidate generation

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

6285

OpenAlex

Cite This

APA Style

                            
                                    Jiawei Han, 
                                
                                    Jian Pei, 
                                
                                    Yiwen Yin
                                
                            (2000). 
                            Mining frequent patterns without candidate generation. 
                            ACM SIGMOD Record
                            , 29
                            (2)
                            , 1-12.
                            https://doi.org/10.1145/335191.335372

Identifiers

DOI: 10.1145/335191.335372