NUCLEAR: AN EFFICIENT METHOD FOR MINING FREQUENT ITEMSETS BASED ON KERNELS AND EXTENDABLE SETS (original) (raw)

A Hybrid Approach for Mining Frequent Itemsets

2013 IEEE International Conference on Systems, Man, and Cybernetics, 2013

Frequent itemset mining is a fundamental element with respect to many data mining problems. Recently, the PrePost algorithm has been proposed, a new algorithm for mining frequent itemsets based on the idea of N-lists. PrePost in most cases outperforms other current state-of-the-art algorithms. In this paper, we present an improved version of PrePost that uses a hash table to enhance the process of creating the N-lists associated with 1-itemsets and an improved N-list intersection algorithm. Furthermore, two new theorems are proposed for determining the "subsume index" of frequent 1-itemsets based on the N-list concept. The experimental results show that the performance of the proposed algorithm improves on that of PrePost.

An Algorithm for Mining Frequent Itemsets

2008 5th International Conference on Electrical Engineering, Computing Science and Automatic Control, 2008

In this paper, we propose a new algorithm for mining frequent itemsets. This algorithm is named AMFI (Algorithm for Mining Frequent Itemsets). This algorithm compresses the data while maintaining the necessary semantics for the frequent itemsets mining problem and it is more efficient that traditional compression algorithms. The AMFI efficiency is based on a compressed vertical binary representation of the data and on a very fast support count. AMFI performs a breadth first search through equivalence classes. We compare our proposal with an implementation using PackBits algorithm.

Efficient Algorithms for Mining Frequent Itemsets with Constraint

2011

An important problem of interactive data mining is "to find frequent item sets contained in a subset C of set of all items on a given database". Reducing the database on C or incorporating it into an algorithm for mining frequent item sets (such as Charm-L, Eclat) and resolving the problem are very time consuming, especially when C is often changed. In this paper, we propose an efficient approach for mining them as follows. Firstly, it is necessary to mine only one time from database the class LGA containing the closed item sets together their generators. After that, when C is changed, the class of all frequent closed item sets and their generators on C is determined quickly from LGA by our algorithm MINE_CG_CONS. We obtain the algorithm MINE_FS_CONS to mine and classify efficiently all frequent item sets with constraint from that class. Theoretical results and the experiments proved the efficiency of our approach.

Simultaneous mining of frequent closed itemsets and their generators: Foundation and algorithm

Engineering Applications of Artificial Intelligence, 2014

Closed itemsets and their generators play an important role in frequent itemset and association rule mining. They allow a lossless representation of all frequent itemsets and association rules and facilitate mining. Some recent approaches discover frequent closed itemsets and generators separately. The Close algorithm mines them simultaneously but it needs to scan the database many times. Based on the properties and relationships of closed itemsets and generators, this study proposes GENCLOSE, an efficient algorithm for mining frequent closed itemsets and generators simultaneously. The level-wise search over an ItemsetObject-setGenerator-Tree enumerates the generators by using a necessary and sufficient condition to produce (iþ 1)-item generators from i-item generators. This condition, based on transaction (object) sets that can be efficiently implemented using diffsets, is very convenient and reliably proved. In the search, pre-closed itemsets are gradually extended using three proposed extension operators. It is shown that these itemsets produce the expected closed itemsets. Extensive experiments on many benchmark databases confirm the efficiency of the proposed approach.

A New Approach for Mining Frequent K-itemset

2007

Discovery of frequent itemsets is an important problem in Data Mining. Most of the previous research based on Apriori, which suffers with generation of huge number of candidate itemsets and performs repeated passes for finding frequent itemsets. To address this problem, we propose an algorithm for finding frequent K-itemsets in which the itemsets whose length is less than K will be pruned from the database and will not be considered for further processing which reduces the size and number of comparisons to be performed. In addition to this, it generates 1-itemset as a data pre processing step which saves time and makes execution fast. The experimental results are included.

A New Method to Mine Frequent Item Sets using Frequent Itemset Tree

IJSRD, 2013

The analysis of observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. To find the association rules among the transactional dataset is the main problem of frequent itemset mining. Many techniques have been developed to increase the efficiency of mining frequent itemsets. In this paper, we denote a new method for generating frequent itemsets using frequent itemset tree (FI-tree). Also we describe the example of new method and its result analysis using wine dataset. Our method execution time is better compare to SaM method.

Efficiently Mining Frequent Itemsets using Various Approaches: A Survey

International Journal of Computer Applications, 2012

In this paper we present the various elementary traversal approaches for mining association rules. We start with a formal definition of association rule and its basic algorithm. We then discuss the association rule mining algorithms from several perspectives such as breadth first approach, depth first approach and Hybrid approach. Comparison of the various approaches is done in terms of time complexity and I/O overhead on CPU. Finally, this paper prospects the association rule mining and discuss the areas where there is scope for scalability.

MapDiff-FI : Map different sets for frequent itemsets mining

Mining frequent sets is one of the fundamental methods from the prospering field of data mining that describe relationships between items in data sets. The size of data sets required for discovery frequent itemsets plays an important role. In recent years, some data structure based on different sets have been proposed, which have shown to be efficient and scalable for mining frequent itemsets. In this paper, we propose Map Different Sets (MapDiff), a novel and more efficient itemset representation, for mining frequent itemsets. For evaluating the performance of MapDiff, we have conducted extensive experiments to compare it with original data sets on a variety of real datasets and synthetic datasets from UCI and IBM. The experimental results showed that MapDiff structure can be reduce the size of datasets with keep all information of original data.

Frequent Itemset Mining in High Dimensional Data: A Review

This paper provides a brief overview of the techniques used in frequent itemset mining. It discusses the search strategies used; i.e. depth first vs. breadth-first, and dataset representation; i.e. horizontal vs. vertical representation. In addition, it reviews many techniques used in several algorithms that make frequent itemset mining more efficient. These algorithms are discussed based on the proposed search strategies which include row-enumeration vs. column -enumeration, bottom-up vs. top-down traversal, and a number of new data structures. Finally, the paper reviews on the latest algorithms of colossal frequent itemset/pattern which currently is the most relevant to mining high-dimensional dataset.

FIMI’03: Workshop on frequent itemset mining implementations

Third IEEE International Conference on Data Mining Workshop on Frequent Itemset Mining Implementations, 2003

The efficiency of frequent itemset mining algorithms is determined mainly by three factors: the way candidates are generated, the data structure that is used and the implementation details. Most papers focus on the first factor, some describe the underlying data structures, but implementation details are almost always neglected. In this paper we show that the effect of implementation can be more important than the selection of the algorithm. Ideas that seem to be quite promising, may turn out to be ineffective if we ...