Mining Approximate Frequent Itemsets In the Presence of Noise: Algorithm and Analysis (original) (raw)

A New Approach for Approximately Mining Frequent Itemsets

2019

Mining frequent itemsets in transaction databases is an important task in many applications. This task becomes challenging when dealing with a very large transaction database because traditional algorithms are not scalable due to the memory limit. In this paper, we propose a new approach for approximately mining of frequent itemsets in a transaction database. First, we partition the set of transactions in the database into disjoint subsets and make the distribution of frequent itemsets in each subset similar to that of the entire database. Then, we randomly select a set of subsets and independently mine the frequent itemsets in each of them. After that, each frequent itemset discovered from these subsets is voted and the one appearing in the majority subsets is determined as a frequent itemset, called a popular frequent itemset. All popular frequent itemsets are compared with the frequent itemsets discovered directly from the entire database using the same frequency threshold. The r...

IIS-Mine: A new efficient method for mining frequent itemsets

2012

A new approach to mine all frequent itemsets from a transaction database is proposed. The main features of this paper are as follows: (1) the proposed algorithm performs database scanning only once to construct a data structure called an inverted index structure (IIS); (2) the change in the minimum support threshold is not affected by this structure, and as a result, a rescan of the database is not required; and (3) the proposed mining algorithm, IIS-Mine, uses an efficient property of an extendable itemset, which reduces the recursiveness of mining steps without generating candidate itemsets, allowing frequent itemsets to be found quickly. We have provided definitions, examples, and a theorem, the completeness and correctness of which is shown by mathematical proof. We present experiments in which the run time, memory consumption and scalability are tested in comparison with a frequent-pattern (FP) growth algorithm when the minimum support threshold is varied. Both algorithms are evaluated by applying them to synthetics and real-world datasets. The experimental results demonstrate that IIS-Mine provides better performance than FP-growth in terms of run time and space consumption and is effective when used on dense datasets.

A LITERATURE SURVEY ON FREQUENT ITEMSET MINING – AN ARM PERSPECTIVE

Association Rules mining (ARM) which finds the relationship between distinct item sets plays an essential role in Item set mining. Frequent item set mining is one of the popular data mining techniques and it can be used in many data mining fields for finding highly correlated itemsets. Frequent items are those items that have been frequently used in the database. Infrequent itemset mining which is the inverse of frequent item set mining that finds the rarely occurring itemsets in the database. Several techniques were existing for mining frequent itemsets and infrequent itemsets with high computing time and are less scalable when the database size increases. This paper focuses on relating the existing algorithms that mines the frequent and infrequent itemsets which creates future researchers to find a way in the domain of association rule mining. Keywords—Association Rules mining (ARM), Apriori, Frequent items, FP-growth, Infrequent Items, performance.

CBW: an efficient algorithm for frequent itemset mining

37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the, 2004

Frequent itemset generation is the prerequisite and most time-consuming process for association rule mining. Nowadays, most efficient Apriori-like algorithms rely heavily on the minimum support constraint to prune a vast amount of non-candidate itemsets. This pruning technique, however, becomes less useful for some real applications where the supports of interesting itemsets are extremely small, such as medical diagnosis, fraud detection, among the others. In this paper, we propose a new algorithm that maintains its performance even at relative low supports. Empirical evaluations show that our algorithm is, on the average, more than an order of magnitude faster than Apriori-like algorithms.

An Algorithm for Mining Frequent Itemsets

2008 5th International Conference on Electrical Engineering, Computing Science and Automatic Control, 2008

In this paper, we propose a new algorithm for mining frequent itemsets. This algorithm is named AMFI (Algorithm for Mining Frequent Itemsets). This algorithm compresses the data while maintaining the necessary semantics for the frequent itemsets mining problem and it is more efficient that traditional compression algorithms. The AMFI efficiency is based on a compressed vertical binary representation of the data and on a very fast support count. AMFI performs a breadth first search through equivalence classes. We compare our proposal with an implementation using PackBits algorithm.

Adaptive and resource-aware mining of frequent sets

2002

The performance of an algorithm that mines frequent sets from transactional databases may severely depend on the specific features of the data being analyzed. Moreover, some architectural characteristics of the computational platform used -e.g. the available main memory -can dramatically change the runtime behaviors of the algorithm. In this paper we present DCI (Direct Count & Intersect), an efficient data mining algorithm for discovering frequent sets from large databases, which effectively addresses the issues mentioned above. DCI adopts a classical level-wise approach based on candidate generation to extract frequent sets, but uses a hybrid method to determine candidate supports. The most innovative contribution of DCI relies on the multiple heuristics strategies employed, which permits DCI to adapt its behavior not only to the features of the specific computing platform, but also to the features of the dataset being mined, so that it results effective in mining both short and long patterns from sparse and dense datasets. The large amount of tests conducted permit us to state that DCI sensibly outperforms state-ofthe-art algorithms for both synthetic and real-world datasets. Finally we also discuss the parallelization strategies adopted in the design of ParDCI, a distributed and multi-threaded implementation of DCI.

Efficiently Mining Frequent Itemsets using Various Approaches: A Survey

International Journal of Computer Applications, 2012

In this paper we present the various elementary traversal approaches for mining association rules. We start with a formal definition of association rule and its basic algorithm. We then discuss the association rule mining algorithms from several perspectives such as breadth first approach, depth first approach and Hybrid approach. Comparison of the various approaches is done in terms of time complexity and I/O overhead on CPU. Finally, this paper prospects the association rule mining and discuss the areas where there is scope for scalability.

A Survey on Approaches for Mining Frequent Itemsets

Data mining is gaining importance due to huge amount of data available. Retrieving information from the warehouse is not only tedious but also difficult in some cases. The most important usage of data mining is customer segmentation in marketing, shopping cart analyzes, management of customer relationship, campaign management, Web usage mining, text mining, player tracking and so on. In data mining, association rule mining is one of the important techniques for discovering meaningful patterns from large collection of data. Discovering frequent itemsets play an important role in mining association rules, sequence rules, web log mining and many other interesting patterns among complex data. This paper presents a literature review on different techniques for mining frequent itemsets.

TR-2009001: BISC: A Binary Itemset Support Counting Approach towards Efficient Frequent Itemset Mining

2009

the performance of a depth-first Frequent Itemset Miming (FIM) algorithm is closely related to the total number of recursions which can be modeled as O(n), where k is the maximal recursion depth and n is the branching factor. Many existing approaches focus more on improving support counting rather than on decreasing n and k, which may lead to unsatisfactory performance as they grow. In this paper a novel approach, Binary Itemset Support Counting (BISC), is presented to address these two factors. Let the direct support of an itemset I be the number of transactions with the same itemset as I, BISC can derive the supports of all the itemsets in a database by iteratively updating their direct supports, thus eliminating the need for further recursion. BISC converts a database into its binary representation and combines one-stage BISC and two-stage BISC to minimize the cost of support updating and memory consumption by eliminating redundant updating operations. By applying BISC with the b...