Abstract
We introduce a new kind of patterns, called emerging patterns (EPs), for knowledge discovery from databases. EPs are defined as itemsets whose supports increase significantly from one dataset to another. EPs can capture emerging trends in timestamped databases, or useful contrasts between data classes. EPs have been proven useful: we have used them to build very powerful classifiers, which are more accurate than C4.5 and CBA, for many datasets. We believe that EPs with low to medium support, such as 1%-20%, can give useful new insights and guidance to experts, in even “well understood” applications. The efficient mining of EPs is a challenging problem, since (i) the Apriori property no longer holds for EPs, and (ii) there are usually too many candidates for high dimensional databases or for small support thresholds such as 0.5%. Naive algorithms are too costly. To solve this problem, (a) we promote the description of large collections of itemsets using their concise borders (the pair of sets of the minimal and of the maximal itemsets in the collections). (b) We design EP mining algorithms which manipulate only borders of collections (especially using our multiborder- differential algorithm), and which represent discovered EPs using borders. All EPs satisfying a constraint can be efficiently discovered by our border-based algorithms, which take the borders, derived by Max-Miner, of large itemsets as inputs. In our experiments on large and high dimensional datasets including the US census and Mushroom datasets, many EPs, including some with large cardinality, are found quickly. We also give other algorithms for discovering general or special types of EPs.
Keywords
Affiliated Institutions
Related Publications
An effective hash-based algorithm for mining association rules
In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the pro...
Efficient mining of partial periodic patterns in time series database
Partial periodicity search, i.e., search for partial periodic patterns in time-series databases, is an interesting data mining problem. Previous studies on periodicity search ma...
Data Mining: the search for knowledge in databases.
Data mining is the search for relationships and global patterns that exist in large databases, but are `hidden' among the vast amounts of data, such as a relationship b...
Discovering informative patterns and data cleaning
We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework als...
A Survey on Network Embedding
Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the network structure. Recently, a significant amount of progresses hav...
Publication Info
- Year
- 1999
- Type
- article
- Pages
- 43-52
- Citations
- 1004
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1145/312129.312191