Mining compressed frequent-pattern sets (original) (raw)
Related papers
On compressing frequent patterns
Data & Knowledge Engineering, 2007
A major challenge in frequent-pattern mining is the sheer size of its mining results. To compress the frequent patterns, we propose to cluster frequent patterns with a tightness measure δ (called δ-cluster), and select a representative pattern for each cluster. The problem of finding a minimum set of representative patterns is shown NP-Hard. We develop two greedy methods, RPglobal and RPlocal. The former has the guaranteed compression bound but higher computational complexity. The latter sacrifices the theoretical bounds but is far more efficient. Our performance study shows that the compression quality using RPlocal is very close to RPglobal, and both can reduce the number of closed frequent patterns by almost two orders of magnitude. Furthermore, RPlocal mines even faster than FPClose[10], a very fast closed frequent-pattern mining method. We also show that RPglobal and RPlocal can be combined together to balance the quality and efficiency.
International Journal of Computer Applications, 2013
In data mining, term frequent pattern extraction is largely used for finding out association rules. Generally association rule mining approaches are used as bottom-up or top-down approach on compressed data structure. In the past, different works proposed different approaches to mine frequent patterns from giving databases. In this paper, we propose a new approach by applying the closed & intersection approach using compressed data structure. We have used closed as bottom-up and intersection as top-down approach. This combined approach allows diminishing the search time by reducing database scan for finding out closed frequent patterns and their association rules. The time complexity of the proposed algorithm is less while the classical approach like a priori has taken more time for given items in the dataset. Experimental results show that our approach is more efficient and effective than a traditional apriori algorithm.
A Compression Algorithm for Mining Frequent Itemsets
In this chapter, we propose a new algorithm for mining frequent itemsets. This algorithm is named AMFI (Algorithm for Mining Frequent Itemsets), it compresses the data while maintains the necessary semantics for the frequent itemsets mining problem and, for this task, it is more efficient than other algorithms that use traditional compression algorithms. The AMFI efficiency is based on a compressed vertical binary representation of the data and on a very fast support count. AMFI introduces a novel way to use equivalence classes of itemsets by performing a breadth first search through them and by storing the class prefix support in compressed arrays. We compared our proposal with an implementation that uses the PackBits algorithm to compress the data.
Mining Top-K Frequent Closed Patterns without Minimum Support
In this paper, we propose a new mining task: mining top-k frequent closed patterns of length no less than min , where k is the desired number of frequent closed patterns to be mined, and min is the minimal length of each pattern. An efficient algorithm, called TFP, is developed for mining such patterns without minimum support. Two methods, closed node count and descendant sum are proposed to effectively raise support threshold and prune FP-tree both during and after the construction of FP-tree. During the mining process, a novel top-down and bottom-up combined FP-tree mining strategy is developed to speed-up supportraising and closed frequent pattern discovering. In addition, a fast hash-based closed pattern verification scheme has been employed to check efficiently if a potential closed pattern is really closed.
Fuzzy Frequent Pattern Mining by Compressing Large Databases
International Journal of Advance Engineering and Research Development, 2015
Task of extract ing useful and interesting knowledge fro m large data is called data min ing. It has many aspects like clustering, classificat ion, anomaly detection, association rule min ing etc. A mong such data min ing aspects, association rule min ing has gained a lot o f interest among the researchers. So me a pplicat ions of association mining include analysis of stock database, mining of the web data, diagnosis in medical do main and analysis of customer behavior. In past, many algorith ms were developed by researchers for mining frequent itemsets but the problem is that it generates candidate itemsets. So, to overco me it tree based approach for mining frequent patterns were developed that performs the min ing operation by constructing tree with item on its node that eliminates the disadvantage of most of the algorith ms. The paper tries to address the problem of finding frequent itemset by compressing the fuzzy FP tree wh ich confines itemsets into fuzzy regions with the membership value. The application of the co mpression mechanism results in co mpact tree structure that reduces the computation time. The proposed method is co mpared with the conventional method for analy zing the performance.
Mining top-k frequent patterns in the presence of the memory constraint
We explore in this paper a practicably interesting mining task to retrieve top-k (closed) itemsets in the presence of the memory constraint. Specifically, as opposed to most previous works that concentrate on improving the mining efficiency or on reducing the memory size by best effort, we first attempt to specify the available upper memory size that can be utilized by mining frequent itemsets. To comply with the upper bound of the memory consumption , two efficient algorithms, called MTK and MTK_Close, are devised for mining frequent itemsets and closed item-sets, respectively, without specifying the subtle minimum support. Instead, users only need to give a more human-understandable parameter, namely the desired number of frequent (closed) itemsets k. In practice, it is quite challenging to constrain the memory consumption while also efficiently retrieving top-k itemsets. To effectively achieve this, MTK and MTK_Close are devised as level-wise search algorithms, where the number of candidates being generated-and-tested in each database scan will be limited. A novel search approach, called δ-stair search, is utilized in MTK and MTK_Close to effectively assign the available memory for testing candidate itemsets with various itemset-lengths, which leads to a small number of required database scans. As demonstrated in the empirical study on real data and synthetic data, instead of only providing the flexibility of striking a compromise between the execution efficiency and the memory consumption, MTK and MTK_Close can both achieve high efficiency and have a constrained memory bound, showing the prominent advantage to be practical algorithms of mining frequent patterns.
High performance frequent patterns extraction using compressed fp-tree
Many algorithms have been proposed to improve the performance of mining frequent patterns from transac-tion databases. Pattern growth algorithms like FP-Growth based on the FP-tree are more efficient than candidate generation and test algorithms. In this paper, we propose a new data structure named Compressed FP-Tree (CFP-Tree) and an algorithm named CT-PRO that performs better than the current algorithms including FP-Growth, OpportuneProject, and Apriori. The number of nodes in a CFP-Tree can be up to 50% less than in the corresponding FP-Tree. CT-PRO is empirically compared with FP-Growth, Opportune-Project, Apriori and CT-ITL using datasets that reveal the effective performance range of these algorithms. CT-PRO is also extended for mining very large data-bases and its scalability evaluated experimentally.
A novel approach for mining maximal frequent patterns
Expert Systems with Applications, 2017
Mining maximal frequent patterns (MFPs) is an approach that limits the number of frequent patterns (FPs) to help intelligent systems operate efficiently. Many approaches have been proposed for mining MFPs, but the complexity of the problem is enormous. Therefore, the run time and memory usage are still large. Recently, the N-list structure has been proposed and verified to be very effective for mining FPs, frequent closed patterns, and top-rank-k FPs.
COMPRESSED FREQUENT PATTERN TREE
The use of Data mining is increasing very rapidly as daily analysis of transaction database consisting of data is increasing. In that data, there ae various item which occur frequently in same pattern. In data mining there are large number of algorithm which are available and used for finding the frequent pattern. In the existing system the algorithm used are Apriori and FP-Growth. The result obtained from such algorithm are very time consuming and not efficient. In proposed system we are using more compact data structure named Compressed FP Tre. We proposed a new algorithm CT-PRO which uses the Compressed FP Tree. The result of the proposed algorithm is much more efficient in terms of performance.
Fast frequent itemset mining using compressed data representation
21 st IASTED International Multi-Conference …, 2003
Fast frequent itemset mining using compressed data representation. RP Gopalan, YG Sucahyo 21 st IASTED International Multi-Conference on Applied Informatics, 1203-1208, 2003. Discovering association rules by identifying ...