A Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases (original) (raw)

DWMiner: a tool for mining frequent item sets efficiently in data warehouses

2007

This work presents DWMiner, an association rules efficient mining tool to process data directly over a relational DBMS data warehouse. DWMiner executes the Apriori algorithm as SQL queries in parallel, using a database PC Cluster middleware developed for SQL query optimization in OLAP applications. DWMiner combines intra-and inter-query parallelism in order to reduce the total time needed to find frequent item sets directly from a data warehouse.

Apriori: a promising data warehouse tool for finding frequent itemset and to define

2016

Data Warehouse-not only the source of information or the place to store historical data but now its plays a vital role and act as a solution in today's fastest growing competitive world of IT and Business. It is obvious that if you keep information in proper place then you will definitely get it back in fast, efficient and easy mode. So to do so it is necessary to classify the data before placing it into the data warehouse. And it is also necessary to create analyses and correlate the association between the data items which is available or newly added. In this paper we represent here an algorithm which is used to find frequent item set from the given database which further used to classify items and to find association between. This paper also represents how to use correlation factor to know the actual and accurate bonding between items.

Efficient Data Mining for Frequent Itemsets in Dynamic and Distributed Databases

2003

Data Mining is one of the central activities associated with understanding and exploiting the world of digital data. It is the mechanized process of modeling large databases by means of discovering useful patterns. A frequent itemset is a pattern describing a relevant subset of the data, and a collection of frequent itemsets is particularly useful because it is an extremely compact model of the database. Discovering frequent itemsets in large databases is usually a hard computational task, which can be even harder when data is dynamic and distributed. Applying traditional algorithms in such data results in high communication overhead, excessive wastage of CPU and I/O resources, privacy violations, and often does not meet the stringent rapid response times, to essentially an interactive process of exploiting the data. Hence, there is an urgent need for non-trivial algorithms that can effectively mine frequent itemsets in dynamic and distributed databases. Such algorithms are presented in this master thesis.

Performance Analysis of Frequent Itemset Mining Using Hybrid Database Representation Approach

2006 IEEE International Multitopic Conference, 2006

Frequent Itemset Mining is considered to an important research oriented task in data mining, due to its large applicability in real world applications. In recent years lot of algorithms and techniques are proposed for enumerating itemsets from transactional databases. In which some are best for dense type datasets, while some are best for sparse type datasets. Currently there is no single algorithm exist that is best for all type of datasets (sparse as well as dense). The main limitation of previous algorithm is that, they depend upon single approach and do not combine the bestfeatures of multiple approaches for speedup the process of itemset mining. In this paper, we first compare and contract the two main itemset mining strategies on different itemset mining factors, scalability of algorithm, item search order, dataset projection and itemsetfrequency counting. Then we introduce a new hybrid strategy that combines the best features of existing strategies. Our different experiments on benchmark datasets show that mining all and maximal frequent itemsets using hybrid approach outperforms the previous algorithms on almost all types ofdense and sparse datasets, which shows the effectiveness ofour approach.

Building a Data Mining Query Optimizer

In this paper, we describe our research into building an optimizer for association rule queries. We present a framework for the query processor and report on the progress of our research so far. An extended SQL syntax is used for expressing association rule queries, and query trees of operators in an extended relational algebra for their internal representation. The placement of constraints in the query tree is discussed. We have developed an efficient algorithm called CT-ITL for lower level implementation of frequent item set generation which is the most critical step of association rule mining. The performance evaluations show that our algorithm compares well with the most efficient algorithms available currently. We also discuss further steps needed to reach our goal of integrating the optimizer with database systems. Keywords Knowledge discovery and data mining, query optimization, association rules, frequent item sets.

Frequent itemset mining in multirelational databases

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009

This paper proposes a new approach to mine multirelational databases. Our approach is based on the representation of a multirelational database as a set of trees. Tree mining techniques can then be applied to identify frequent patterns in this kind of databases. We propose two alternative schemes for representing a multirelational database as a set of trees. The frequent patterns that can be identified in such set of trees can be used as the basis for other multirelational data mining techniques, such as association rules, classification, or clustering.

Mining frequent itemsets in large data warehouses: a novel approach proposed for sparse data sets

Intelligent Data Engineering …, 2007

Proposing efficient techniques for discovery of useful information and valuable knowledge from very large databases and data warehouses has attracted the attention of many researchers in the field of data mining. The wellknown Association Rule Mining (ARM) algorithm, Apriori, searches for frequent itemsets (i.e., set of items with an acceptable support) by scanning the whole database repeatedly to count the frequency of each candidate itemset. Most of the methods proposed to improve the efficiency of the Apriori algorithm attempt to count the frequency of each itemset without re-scanning the database. However, these methods rarely propose any solution to reduce the complexity of the inevitable enumerations that are inherited within the problem. In this paper, we propose a new algorithm for mining frequent itemsets and also association rules. The algorithm computes the frequency of itemsets in an efficient manner. Only a single scan of the database is required in this algorithm. The data is encoded into a compressed form and stored in main memory within a suitable data structure. The proposed algorithm works in an iterative manner, and in each iteration, the time required to measure the frequency of an itemset is reduced further (i.e., checking the frequency of ndimensional candidate itemsets is much faster than those of n-1 dimensions). The efficiency of our algorithm is evaluated using artificial and real-life datasets. Experimental results indicate that our algorithm is more efficient than existing algorithms.

Itemset Mining over Large Transactional Tables on the Relational Databases

Most of the itemset mining approaches are memory-like and run outside of the database. On the other hand, when we deal with data warehouse the size of tables is extremely huge for memory copy. In addition, using a pure SQL-like approach is quite inefficient. Actually, those implementations rarely take advantages of database programming. Furthermore, RDBMS vendors offer a lot of features for taking control and management of the data. We purpose a pattern growth mining approach by means of database programming for finding all frequent itemsets. The main idea is to avoid one-at-a-time record retrieval from the database, saving both the copying and process context switching, expensive joins, and table reconstruction. The empirical evaluation of our approach shows that runs competitively with the most known itemset mining implementations based on SQL. Our performance evaluation was made with SQL Server 2000 (v.8) and T-SQL, throughout several synthetical datasets.

IIS-Mine: A new efficient method for mining frequent itemsets

2012

A new approach to mine all frequent itemsets from a transaction database is proposed. The main features of this paper are as follows: (1) the proposed algorithm performs database scanning only once to construct a data structure called an inverted index structure (IIS); (2) the change in the minimum support threshold is not affected by this structure, and as a result, a rescan of the database is not required; and (3) the proposed mining algorithm, IIS-Mine, uses an efficient property of an extendable itemset, which reduces the recursiveness of mining steps without generating candidate itemsets, allowing frequent itemsets to be found quickly. We have provided definitions, examples, and a theorem, the completeness and correctness of which is shown by mathematical proof. We present experiments in which the run time, memory consumption and scalability are tested in comparison with a frequent-pattern (FP) growth algorithm when the minimum support threshold is varied. Both algorithms are evaluated by applying them to synthetics and real-world datasets. The experimental results demonstrate that IIS-Mine provides better performance than FP-growth in terms of run time and space consumption and is effective when used on dense datasets.