Fast Algorithms for Mining Interesting Frequent Itemsets without Minimum Support (original) (raw)

Applying bit-vector projection approach for efficient mining of N-most interesting frequent itemsets

2007

Real world datasets are sparse, dirty and contain hundreds of items. In such situations, discovering interesting rules (results) using traditional frequent itemset mining approach by specifying a user defined input support threshold is not appropriate. Since without any domain knowledge, setting support threshold small or large can output nothing or a large number of redundant uninteresting results. Recently a novel approach of mining N-most interesting itemsets is proposed, which discovers only top N interesting results without specifying any user defined support threshold. However, mining N-most interesting itemsets are more costly in terms of itemset search space exploration and processing cost. Thereby, the efficiency of mining process highly depends upon the itemset frequency (support) counting, implementation techniques and projection of relevant transactions to lower level nodes of search space. In this paper, we present a novel N-most interesting itemset mining algorithm (N-MostMiner) using the bit-vector representation approach which is very efficient in terms of itemset frequency counting and transactions projection. Several efficient implementation techniques of N-MostMiner are also present which we experienced in our implementation. Our different experimental results on benchmark datasets suggest that the N-MostMiner is very efficient in terms of processing time as compared to currently best algorithm BOMO.

Dynamic bit vectors: An efficient approach for mining frequent itemsets

There are two common kinds of data formats to be adopted in data mining. One is horizontal, and the other is vertical. Approaches based on vertical data formats have the advantages of requiring a fewer number of database scans and computing itemset supports fast. One of the vertical data representations, bit vector, has recently been widely used for mining frequent item sets and has caused significant results. The sizes of bit vectors for item sets are, however, always the same, equal to the number of transactions in a database. In this paper, we propose the scheme of dynamic bit vectors to reduce the memory and the computational time for mining frequent item sets from transaction databases. A fast method for computing the intersection of two dynamic bit vectors and an algorithm for mining frequent item sets based on the scheme are presented. The proposed algorithm is also compared with some other approaches and experimental results show that it is quite efficient in both the mining time and the memory usage.

Ramp: Fast Frequent Itemset Mining with Efficient Bit-Vector Projection Technique

Mining frequent itemset using bit-vector representation approach is very efficient for dense type datasets, but highly inefficient for sparse datasets due to lack of any efficient bit-vector projection technique. In this paper we present a novel efficient bit-vector projection technique, for sparse and dense datasets. To check the efficiency of our bit-vector projection technique, we present a new frequent itemset mining algorithm Ramp (Real Algorithm for Mining Patterns) build upon our bit-vector projection technique. The performance of the Ramp is compared with the current best (all, maximal and closed) frequent itemset mining algorithms on benchmark datasets. Different experimental results on sparse and dense datasets show that mining frequent itemset using Ramp is faster than the current best algorithms, which show the effectiveness of our bit-vector projection idea. We also present a new local maximal frequent itemsets propagation and maximal itemset superset checking approach FastLMFI, build upon our PBR bit-vector projection technique. Our different computational experiments suggest that itemset maximality checking using FastLMFI is fast and efficient than a previous will known progressive focusing approach.

A New Algorithm for Discovery Maximal Frequent Itemsets

… Conference on Data …, 2008

Frequent itemsets mining plays an important role in data mining research for over a decade. However, the mining of the all frequent itemsets will lead to a massive number of itemsets. Fortunately, this problem can be reduced to the mining of maximal frequent itemsets. In this paper, we propose a new method for mining maximal frequent itemsets. Our method introduces an efficient database encoding technique, a novel tree structure called PC_Tree and also PC_Miner algorithm. The database encoding technique utilizes Prime number characteristics and transforms each transaction into a positive integer that has all properties of its items. The PC_Tree is a simple tree structure but yet powerful to capture whole of transactions by one database scan. The PC_Miner algorithm traverses the PC_Tree to mine maximal frequent itemsets. Experiments verify the efficiency and advantages of the proposed method.

IIS-Mine: A new efficient method for mining frequent itemsets

2012

A new approach to mine all frequent itemsets from a transaction database is proposed. The main features of this paper are as follows: (1) the proposed algorithm performs database scanning only once to construct a data structure called an inverted index structure (IIS); (2) the change in the minimum support threshold is not affected by this structure, and as a result, a rescan of the database is not required; and (3) the proposed mining algorithm, IIS-Mine, uses an efficient property of an extendable itemset, which reduces the recursiveness of mining steps without generating candidate itemsets, allowing frequent itemsets to be found quickly. We have provided definitions, examples, and a theorem, the completeness and correctness of which is shown by mathematical proof. We present experiments in which the run time, memory consumption and scalability are tested in comparison with a frequent-pattern (FP) growth algorithm when the minimum support threshold is varied. Both algorithms are evaluated by applying them to synthetics and real-world datasets. The experimental results demonstrate that IIS-Mine provides better performance than FP-growth in terms of run time and space consumption and is effective when used on dense datasets.

A New Approach for Mining Frequent K-itemset

2007

Discovery of frequent itemsets is an important problem in Data Mining. Most of the previous research based on Apriori, which suffers with generation of huge number of candidate itemsets and performs repeated passes for finding frequent itemsets. To address this problem, we propose an algorithm for finding frequent K-itemsets in which the itemsets whose length is less than K will be pruned from the database and will not be considered for further processing which reduces the size and number of comparisons to be performed. In addition to this, it generates 1-itemset as a data pre processing step which saves time and makes execution fast. The experimental results are included.

A Hybrid Approach for Mining Frequent Itemsets

2013 IEEE International Conference on Systems, Man, and Cybernetics, 2013

Frequent itemset mining is a fundamental element with respect to many data mining problems. Recently, the PrePost algorithm has been proposed, a new algorithm for mining frequent itemsets based on the idea of N-lists. PrePost in most cases outperforms other current state-of-the-art algorithms. In this paper, we present an improved version of PrePost that uses a hash table to enhance the process of creating the N-lists associated with 1-itemsets and an improved N-list intersection algorithm. Furthermore, two new theorems are proposed for determining the "subsume index" of frequent 1-itemsets based on the N-list concept. The experimental results show that the performance of the proposed algorithm improves on that of PrePost.

An Algorithm for Mining Frequent Itemsets

2008 5th International Conference on Electrical Engineering, Computing Science and Automatic Control, 2008

In this paper, we propose a new algorithm for mining frequent itemsets. This algorithm is named AMFI (Algorithm for Mining Frequent Itemsets). This algorithm compresses the data while maintaining the necessary semantics for the frequent itemsets mining problem and it is more efficient that traditional compression algorithms. The AMFI efficiency is based on a compressed vertical binary representation of the data and on a very fast support count. AMFI performs a breadth first search through equivalence classes. We compare our proposal with an implementation using PackBits algorithm.

Efficiently Mining Frequent Itemsets using Various Approaches: A Survey

International Journal of Computer Applications, 2012

In this paper we present the various elementary traversal approaches for mining association rules. We start with a formal definition of association rule and its basic algorithm. We then discuss the association rule mining algorithms from several perspectives such as breadth first approach, depth first approach and Hybrid approach. Comparison of the various approaches is done in terms of time complexity and I/O overhead on CPU. Finally, this paper prospects the association rule mining and discuss the areas where there is scope for scalability.

A New Method to Mine Frequent Item Sets using Frequent Itemset Tree

IJSRD, 2013

The analysis of observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. To find the association rules among the transactional dataset is the main problem of frequent itemset mining. Many techniques have been developed to increase the efficiency of mining frequent itemsets. In this paper, we denote a new method for generating frequent itemsets using frequent itemset tree (FI-tree). Also we describe the example of new method and its result analysis using wine dataset. Our method execution time is better compare to SaM method.