Efficient Mining Top-k Regular-Frequent Itemset Using Compressed Tidsets (original) (raw)

Mining top-k regular-frequent itemsets from transactional database

2010

Association rule based on support-confident framework is an important task in data mining community. However, the occurrence frequency of a pattern may not be sufficient criterion for mining meaningful patterns. The occurrence behavior can be revealed as an important key in several applications. A pattern is a regular pattern if it regularly occurs in a user-given period (regularity threshold). To mine regular itemsets, a support threshold is used to filter some regular itemsets. However, in practice, it is often difficult for users to provide an appropriate support threshold. Indeed, a too small support threshold could yield a number of regular-frequent itemsets impractically large while a too large threshold could yield very few or no regular-frequent itemsets. Therefore, the use of a support threshold tends to produce a large number of regular-frequent itemsets and it could be better to ask for the number of desired results. Currently, from the deep survey, there is no existing a...

Efficiently Mining Frequent Itemsets in Transactional Databases

Journal of Marine Science and Technology, 2016

Discovering frequent itemsets is an essential task in association rules mining and it is considered to be computationally expensive. To find the frequent itemsets, the algorithm of frequent pattern growth (FP-growth) is one of the best algorithms for mining frequent patterns. However, many experimental results have shown that building conditional FP-trees during mining data using this FP-growth method will consume most of CPU time. In addition, it requires a lot of space to save the FP-trees. This paper presents a new approach for mining frequent item sets from a transactional database without building the conditional FP-trees. Thus, lots of computing time and memory space can be saved. Experimental results indicate that our method can reduce lots of running time and memory usage based on the datasets obtained from the FIMI repository website.

An enhanced constraint based technique for frequent itemset mining in transactional databases

International Journal of Engineering & Technology, 2018

Mining frequent patterns is one of the wide area of research in recent times as it has numerous social applications. Variety of frequent patterns finds usage in diverse applications and the research to mine those in an optimized way is an important aspect under consideration. So far, many algorithms had been proposed for mining frequent itemsets and each has their own pros and cons. The basic algorithms used in the process are Apriori, Fpgrowth and Eclat. Many enhancements of these algorithms are ongoing process in recent times. In this paper, an enhanced Varied Support Frequent Itemset (VSFIM) algorithm is proposed which is an enhancement of FPGrowth algorithm. Unique minimum support for each item in the transaction is provided and then mining is done in the proposed approach. The performance of the proposed algorithm is tested with existing algorithms. It is found that VSFIM outperformed the existing algorithms in both processing time and space utilization.

Result Analysis of Mining Fast Frequent Itemset Using Compacted Data

International Journal of Information Sciences and Techniques, 2014

Data mining and knowledge discovery of database is magnetizing wide array of non-trivial research arena, making easy to industrial decision support systems and continues to expand even beyond imagination in one such promising field like Artificial Intelligence and facing the real world challenges. Association rules forms an important paradigm in the field of data mining for various databases like transactional database, time-series database, spatial, object-oriented databases etc. The burgeoning amount of data in multiple heterogeneous sources coalesces with the impediment in building and preserving central vital repositories compels the need for effectual distributive mining techniques. The majority of the previous studies rely on an Apriori-like candidate set generation-and-test approach. For these applications, these forms of aged techniques are found to be quite expensive, sluggish and highly subjective in case there exists long length patterns.

TreeITL-Mine: Mining Frequent Itemsets Using Pattern Growth, Tid Intersection, and Prefix Tree

Lecture Notes in Computer Science, 2002

An important problem in data mining is the discovery of association rules that identify relationships among sets of items. There are two steps in mining association rules: finding the frequent itemsets and generating association rules from them. Since the mining of frequent itemsets is computationally expensive, most of the research attention has been focused on it. Strategies to speed up the mining of frequent itemsets fall into two general categories: candidate generation-and-test and pattern-growth. The pattern growth approach overcomes some of the inefficiencies of the candidate generation-and-test approach typified by the Apriori algorithm. However, further improvements can be made to the pattern-growth approach for better performance of the mining process. This may be achieved by reducing the number of transactions held in memory and the number of traversals of each transaction. We propose to reduce the number of the transactions in memory by grouping and counting transactions that have the same item sets. The number of item traversals in the transactions can be reduced by using a modified transaction-id intersection. Based on these ideas, we have designed a new algorithm for mining frequent itemsets called TreeITL-Mine. In this paper, we present TreeITL-Mine along with a new data structure called Item-Trans Link (ITL). ITL combines the advantages of horizontal and vertical data layouts for association rule mining. In designing our algorithm, we have combined several existing ideas including pattern-growth, tid-intersection and prefix trees. We present performance comparisons of our algorithm against the fastest available implementation of the Apriori algorithm, and the recently developed H-Mine algorithm. To study the trade-offs in using the prefix tree to compress transactions, we also compare the performance of our algorithm with the prefix tree and without. We have tested all the algorithms using several widely used test datasets. The performance results indicate that the new algorithm significantly reduces the processing time for mining frequent itemsets from dense data sets that contain relatively long patterns. We discuss the results in detail and point out the strengths and limitations of our algorithm.

IIS-Mine: A new efficient method for mining frequent itemsets

2012

A new approach to mine all frequent itemsets from a transaction database is proposed. The main features of this paper are as follows: (1) the proposed algorithm performs database scanning only once to construct a data structure called an inverted index structure (IIS); (2) the change in the minimum support threshold is not affected by this structure, and as a result, a rescan of the database is not required; and (3) the proposed mining algorithm, IIS-Mine, uses an efficient property of an extendable itemset, which reduces the recursiveness of mining steps without generating candidate itemsets, allowing frequent itemsets to be found quickly. We have provided definitions, examples, and a theorem, the completeness and correctness of which is shown by mathematical proof. We present experiments in which the run time, memory consumption and scalability are tested in comparison with a frequent-pattern (FP) growth algorithm when the minimum support threshold is varied. Both algorithms are evaluated by applying them to synthetics and real-world datasets. The experimental results demonstrate that IIS-Mine provides better performance than FP-growth in terms of run time and space consumption and is effective when used on dense datasets.

A Survey on Frequent Item Set Mining for Large Transactional Data

Information Technology in Industry, 2021

In the decision making process the Data Analytics plays an important role. The Insights that are obtained from pattern analysis gives many benefits like cost cutting, good revenue, and better competitive advantage. On the other hand the patterns of frequent itemsets that are hidden consume more time for extraction when data increases over time. However less memory consumption is required for mining the patterns of frequent itemsets because of heavy computation. Therefore, an algorithm required must be efficient for mining the patterns of the frequent itemsets that are hidden which takes less memory with short run time. This paper presents a review of different algorithms for finding Frequent Itemsets so that a more efficient algorithm for finding frequent items sets can be developed.

Postdiffset: an Eclat-like algorithm for frequent itemset mining

International Journal of Engineering & Technology

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns. Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical...

A new approach for the discovery of frequent itemsets

1999

The discovery of the most recurrent association rules, in a large database of sales transactions requires that the sets of items bought together by a sufficiently large population of customers are identified. This is a critical task, since the number of generated itemsets grows exponentially with the total number of items. Most of the algorithms start identifying the sets with the lowest cardinality, and subsequently, increase it progressively.

CBW: an efficient algorithm for frequent itemset mining

37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the, 2004

Frequent itemset generation is the prerequisite and most time-consuming process for association rule mining. Nowadays, most efficient Apriori-like algorithms rely heavily on the minimum support constraint to prune a vast amount of non-candidate itemsets. This pruning technique, however, becomes less useful for some real applications where the supports of interesting itemsets are extremely small, such as medical diagnosis, fraud detection, among the others. In this paper, we propose a new algorithm that maintains its performance even at relative low supports. Empirical evaluations show that our algorithm is, on the average, more than an order of magnitude faster than Apriori-like algorithms.