An Approach of Data Mining on Compressed Transaction (original) (raw)

Efficient Approach for Large Database Compressed In Association Mining

International Journal of Computer Applications, 2018

Large Amount of data is being using very rapidly in the world. It to be compressed takes much more time and takes lot of effort to process these data for knowledge discovery and decision making. Data compression technique is one of good solutions to be reduce size of data that can be save more time the time of discovering useful knowledge by using appropriate methods, for example Data mining. Data mining is used to help users discover interesting and useful knowledge more easily to decision making purpose. It is more and more popular to apply the association rule mining in recent years because of its wide applications in many fields such as stock analysis, web log mining, medical diagnosis, customer market analysis and bioinformatics. In this paper the main focus in on association rule mining and data pre-process with data compression. In this paper we analysis the methods simple Apriori, Partion based Apriori and Apriori with compressed dataset. We compare these three methods on the basis of minimum support, minimum confidence, number of records and execution times.

A Compression Based Methodology to Mine All Frequent Items

International Journal of Trend in Scientific Research and Development, 2018

Data mining is not new. People who first discovered how to start fire and that the earth is round also discovered knowledge which is the main idea of Data mining. Data Mining, also called knowledge Discovery in Database, is one of the latest research area, which has emerged in response to the Tsunami data or the flood of data, world is facing nowadays. It has taken up the challenge to develop techniques that can help humans to discover useful patterns data. One such important technique is frequent pattern mining. This paper will present an compression based technique for mining frequent items from a transaction data set.

An Efficient Compression Algorithm for Uncertain Databases Aimed at Mining Problems

Lecture Notes on Software Engineering, 2015

Many studies on association rule mining have focused on item sets from precise data in which the presence and absence of items in transactions was certainly known. In some applications, the presence and absence of items in transactions are uncertain and the knowledge discovered from this type of data will extracted with approximation manner. Data compression offers a good solution to reduce data size that can save the time of discovering useful knowledge. In this paper we suggest a new algorithm to compress transactions from uncertain database based on modified version of M 2 TQT (Mining Merged Transactions with the Quantification Table) approach and fuzzy logic concept. The algorithm bands the uncertain data to set of clusters using K-Mean algorithm and exploits fuzzy membership function to classify the transaction items as one of those clusters. Finally, the modified version of M 2 TQT has been employed to compress the classified transactions. The key idea of our algorithm is that since uncertain data is probabilistic in nature and frequent item set is counted as expected values so, compressed transactions will give us approximate values for the item set's support. Experimental results show that the proposed algorithm is better than U-Apriori algorithm in case of large uncertain database.

Compression-based data mining of sequential data

2007

Abstract The vast majority of data mining algorithms require the setting of many input parameters. The dangers of working with parameter-laden algorithms are twofold. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorithm may report spurious patterns that do not really exist, or greatly overestimate the significance of the reported patterns.

A New Approach for Extracting Closed Frequent Patterns and their Association Rules using Compressed Data Structure

International Journal of Computer Applications, 2013

In data mining, term frequent pattern extraction is largely used for finding out association rules. Generally association rule mining approaches are used as bottom-up or top-down approach on compressed data structure. In the past, different works proposed different approaches to mine frequent patterns from giving databases. In this paper, we propose a new approach by applying the closed & intersection approach using compressed data structure. We have used closed as bottom-up and intersection as top-down approach. This combined approach allows diminishing the search time by reducing database scan for finding out closed frequent patterns and their association rules. The time complexity of the proposed algorithm is less while the classical approach like a priori has taken more time for given items in the dataset. Experimental results show that our approach is more efficient and effective than a traditional apriori algorithm.

Large Dataset Compression Approach Using Intelligent Technique

Data clustering is a process of putting similar data into groups. A clustering algorithms partition data set into several groups such that the similarity within a group is larger than among groups. Association rule is one of the possible methods for analysis of data. The association rules algorithm generates a huge number of association rules, of which many are redundant. The main idea of this paper is to compress large database by using clustering techniques with association rule algorithms. In the first stage, the database is compressed by using clustering techniques followed by association rules algorithm. Adaptive k-means clustering algorithm is proposed with apriori algorithm. Due to many experiments by using the adaptive k-means algorithm and apriori algorithm together it gives better compression ratio and smaller compressed file size than the compression ratio and compressed file size that are given from using each algorithm alone. Several experiments were made in several different sizes of database. The apriori algorithm increases the compression ratio of the adaptive k-means algorithm when hey are used together but it takes more compression time than the adaptive k-means takes. These algorithms are presented and their results are compared.

Mining compression sequential patterns

2012

Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract sequential patterns that compress a database of sequences well. We show that mining compressing patterns is NP-Hard and belongs to the class of inapproximable problems. We propose two heuristic algorithms to mining compressing patterns. The first uses a two-phase approach similar to Krimp for itemset data. To overcome performance with the required candidate generation we propose GoKrimp, an effective greedy algorithm that directly mines compressing patterns. We conduct an empirical study on six real-life datasets to compare the proposed algorithms by run time, compressibility, and classification accuracy using the patterns found as features for SVM classifiers.

Compact transaction database for efficient frequent pattern mining

2005 IEEE International Conference on Granular Computing, 2005

Mining frequent patterns is one of the fundamental and essential operations in many data mining applications, such as discovering association rules. In this paper, we propose an innovative approach to generating compact transaction databases for efficient frequent pattern mining. It uses a compact tree structure, called CT-tree, to compress the original transactional data. This allows the CT-Apriori algorithm, which is revised from the classical Apriori algorithm, to generate frequent patterns quickly by skipping the initial database scan and reducing a great amount of I/O time per database scan. Empirical evaluations show that our approach is effective, efficient and promising, while the storage space requirement as well as the mining time can be decreased dramatically on both synthetic and real-world databases.

COMPRESSED FREQUENT PATTERN TREE

The use of Data mining is increasing very rapidly as daily analysis of transaction database consisting of data is increasing. In that data, there ae various item which occur frequently in same pattern. In data mining there are large number of algorithm which are available and used for finding the frequent pattern. In the existing system the algorithm used are Apriori and FP-Growth. The result obtained from such algorithm are very time consuming and not efficient. In proposed system we are using more compact data structure named Compressed FP Tre. We proposed a new algorithm CT-PRO which uses the Compressed FP Tree. The result of the proposed algorithm is much more efficient in terms of performance.

Fuzzy Frequent Pattern Mining by Compressing Large Databases

International Journal of Advance Engineering and Research Development, 2015

Task of extract ing useful and interesting knowledge fro m large data is called data min ing. It has many aspects like clustering, classificat ion, anomaly detection, association rule min ing etc. A mong such data min ing aspects, association rule min ing has gained a lot o f interest among the researchers. So me a pplicat ions of association mining include analysis of stock database, mining of the web data, diagnosis in medical do main and analysis of customer behavior. In past, many algorith ms were developed by researchers for mining frequent itemsets but the problem is that it generates candidate itemsets. So, to overco me it tree based approach for mining frequent patterns were developed that performs the min ing operation by constructing tree with item on its node that eliminates the disadvantage of most of the algorith ms. The paper tries to address the problem of finding frequent itemset by compressing the fuzzy FP tree wh ich confines itemsets into fuzzy regions with the membership value. The application of the co mpression mechanism results in co mpact tree structure that reduces the computation time. The proposed method is co mpared with the conventional method for analy zing the performance.