COMPARATIVE STUDY OF DISTRIBUTED FREQUENT PATTERN MINING ALGORITHMS FOR BIG SALES DATA (original) (raw)

IJERT-Novel Most Frequent Pattern Mining Approach Using Distributed Computing Environment

International Journal of Engineering Research and Technology (IJERT), 2013

https://www.ijert.org/novel-most-frequent-pattern-mining-approach-using-distributed-computing-environment https://www.ijert.org/research/novel-most-frequent-pattern-mining-approach-using-distributed-computing-environment-IJERTV2IS2293.pdf Frequent patterns are frequent data set in transactional data set, play an essential role in mining associations, correlations and many other interesting relationships among data that leads knowledge discovery and helps in many business decision making processes [1]. Data mining is a very basic operational technique in knowledge discovery and decision making processes. Frequent pattern mining techniques have become necessary for massive amount datasets in distributed data mining approach using distributed computing environment. This paper discuss novel approach for efficient and scalable distributed algorithm for most frequent itemsets generation on Boolean types of single dimensional and single level data mining using distributed computing environments in transactional dataset.

IJERT-A Novel Algorithm PDA (Parallel And Distributed Apriori) for Frequent Pattern Mining

International Journal of Engineering Research and Technology (IJERT), 2014

https://www.ijert.org/a-novel-algorithm-pda-parallel-and-distributed-apriori-for-frequent-pattern-mining https://www.ijert.org/research/a-novel-algorithm-pda-parallel-and-distributed-apriori-for-frequent-pattern-mining-IJERTV3IS081037.pdf Frequent itemset mining is the highly researchable field of data mining. Apriori and FP Growth algorithms are most traditional algorithms for it. Developing fast and efficient algorithm for frequent pattern mining is challenging task. In this paper, we are improving the efficiency of Apriori algorithm using transaction reduction concept to handle big data problem which can partition the data into the clusters and perform data mining operation in parallel as well as distributed environment. Implementation is being done in Hadoop. This method does not require redundant communication or computation, but can achieve load balancing so as to fully utilize the computing resources.

Efficient Parallel Mining Of Frequent Itemset Using MapReduce

International Journal of Information Systems and Computer Sciences, 2019

Big dataextremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interaction and the data mining used for dig deep into analyzing the patterns and relationships of data. Frequent item set mining is a data mining method that was developed for market basket analysis. In the project proposed to anefficient data processing using Lshfp growth algorithm and grouping similar objects as the clusters with group id. The traditional datamining is based on the fp growth algorithm focused on the load balancing, and distributed among the nodes of the clusters.The process is mainly based on mapreduce which highly supported by Hadoop.Hadoop is a efficient popular frame work which supports mapreduce and itemset mining .Map reduce is that which contains map phase and reduce phase.Map phase which results the pair of key values and reduce phase which results the reduced results. It aims to decrease network overhead and efficient processing.

High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework

International Journal of Intelligent Systems and Applications

The Huge amount of Big Data is constantly arriving with the rap id development of business organizations and they are interested in ext racting knowledgeable info rmation fro m co llected data. Frequent item min ing of Big Data helps with business decision and to provide high quality service. The result of traditional frequent item set min ing algorith m on Big Data is not an effective way which leads to high co mputation time. An Apache Hadoop MapReduce is the most popular data intensive distributed computing framework for large scale data applications such as data mining. In this paper, the author identifies the factors affecting on the performance of frequent item mining algorith m based on Hadoop MapReduce technology and proposed an approach for optimizing the performance of large scale frequent item set mining. The Experiments result shows the potential of the proposed approach. Performance is significantly optimized for large scale data min ing in MapReduce technique. The author believes that it has a valuable contribution in the high performance co mputing of Big Data.

Mining of Association Rules on Large Database Using Distributed and Parallel Computing

Procedia Computer Science, 2016

Now days due to rapid growth of data in organizations, extensive data processing is a central point of Information Technology. Mining of Association rules in large database is the challenging task. An Apriori algorithm is widely used to find out the frequent item sets from database. But it will be inefficient in case of large database because it will require more I/O load. Later drawback of the Apriori algorithm is overcome by many algorithms / parallel algorithms (model) but those are also inefficient to find frequent item sets from large database with less time and with great efficiency. Hence hybrid architecture is proposed which consists of integrated distributed and parallel computing concept. The main idea of new architecture is that we combine distributed as well as parallel computing in such a way that it will be efficient to find out frequent item sets from large databases in less time. It also handle large database with efficiently than existing algorithms.

Mining Distributed Frequent Itemset with Hadoop

2014

In the current scenario there has been growing attention in the area of distributed environment especially in data mining. Frequent pattern mining is active area of research in today’s scenario. In this paper a survey on frequent itemset mining with distributed environment has been presented. The evaluation of algorithm with frequent itemsets and association rule mining has been growing rapidly. The present characteristics of algorithms through comparison matrix have been shown and proposed algorithm with the current bottleneck is presented. The current issues of communication overhead and fault tolerance has been addressed and solved by proposed scheme. Keywordsfrequent itemset, ARM, trie, distributed mining

The MapReduce Model on Cascading Platform for Frequent Itemset Mining

IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 2018

The implementation of parallel algorithms is very interesting research recently. Parallelism is very suitable to handle large-scale data processing. MapReduce is one of the parallel and distributed programming models. The implementation of parallel programming faces many difficulties. The Cascading gives easy scheme of Hadoop system which implements MapReduce model.Frequent itemsets are most often appear objects in a dataset. The Frequent Itemset Mining (FIM) requires complex computation. FIM is a complicated problem when implemented on large-scale data. This paper discusses the implementation of MapReduce model on Cascading for FIM. The experiment uses the Amazon dataset product co-purchasing network metadata.The experiment shows the fact that the simple mechanism of Cascading can be used to solve FIM problem. It gives time complexity O(n), more efficient than the nonparallel which has complexity O(n2/m).

A Survey on Association Rule Mining Algorithm and Architecture for Distributed Processing

2014

Association rule mining is a data mining technique used to uncover previously unknown hidden patterns or rules from huge databases usually tera and peta bytes of data. There are many popular algorithms for mining various association rules like Apriori, portioning, dynamic item set counting etc. But the main drawback of these algorithms is their sequential nature. Processing large databases in sequential order has many disadvantages like time consuming, scalability and performance issues. In order to avoid the above said problems we look for parallel or distributed association rule mining for providing scalability and better performance.

Performance Analysis of Association Rule Mining Using Hadoop

To discover association between different items in large datasets; Association rule mining plays a major role. There are several association algorithms among which the Apriori Algorithm is most suitable one. Actually the Apriori Algorithm is capable of run on single node or computer, due to which it limits the use of this algorithm on large datasets. There have various studies for parallelizing the algorithm. In this paper, Apache G-Hadoop was chosen as the distributed framework to implement the algorithm, to evaluate the performance of the algorithm on G-Hadoop The performance and analysis shows the most suitable platform for distributed association rule mining.

ParallelCharMax: An Effective Maximal Frequent Itemset Mining Algorithm Based on MapReduce Framework

—Nowadays, the explosive growth in data collection in business and scientific areas has required the need to analyze and mine useful knowledge residing in these data. The recourse to data mining techniques seems to be inescapable in order to extract useful and novel patterns/models from large datasets. In this context, frequent itemsets (patterns) play an essential role in many data mining tasks that try to find interesting patterns from datasets. However, conventional approaches for mining frequent itemsets in Big Data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm, called ParallelCharMax, that is based on a powerful sequential algorithm, called Charm, and computes the maximal frequent itemsets that are considered perfect summaries of the frequent ones. The proposed algorithm has been implemented using MapReduce framework. The experimental component of the study shows the efficiency and the performance of the proposed algorithm compared with well known algorithms such as MineWithRounds and HMBA.