Incremental mining of sequential patterns in large databases (original) (raw)

An incremental mining algorithm for maintaining sequential patterns using pre-large sequences

2014

Mining useful information and helpful knowledge from large databases has evolved into an important research area in recent years. Among the classes of knowledge derived, finding sequential patterns in temporal transaction databases is very important since it can help model customer behavior. In the past, researchers usually assumed databases were static to simplify datamining problems. In real-world applications, new transactions may be added into databases frequently. Designing an efficient and effective mining algorithm that can maintain sequential patterns as a database grows is thus important. In this paper, we propose a novel incremental mining algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce the need for rescanning original databases.

Pure Incremental Approach for Sequential Pattern Mining

In data mining, mining sequential pattern from a very huge amount of database is very useful in many applications. Most of sequential pattern mining algorithms works on static data means the database should not change. But the databases in today's real world application do not have static data, rather they are incremental databases. New transactions are added at some intervals of time in database. For updated database, the algorithm actually needs to be executed again for whole sequence database. So those approaches are not appropriate to use, for that the algorithm with incremental approach should be modelled and used. In this paper analysis of existing approaches for finding sequential pattern mining, and the survey is helpful in forming a new model or improving some existing approach to handle incremented database & obtain sequential patterns out of them. In this a proposed a model that is totally incremental approach, which we call pure incremental approach. This proposed pure incremental mining is used for mining the frequent sequences for sequence database.

Mining Approach for Updating Sequential Patterns

We are given a large database of customer transactions, where each transaction consists of customerid, transaction time, and the items bought in the transaction. The discovery of frequent sequences in temporal databases is an important data mining problem. Most current work assumes that the database is static, and a database update requires rediscovering all the patterns by scanning the entire old and new database. We consider the problem of the incremental mining of sequential patterns when new transactions or new customers are added to an original database. In this paper, we propose novel techniques for maintaining sequences in the presence of a) database updates, and b) user interaction (e.g. modifying mining parameters). This is a very challenging task, since such updates can invalidate existing sequences or introduce new ones.

Incremental mining of sequential patterns: Progress and challenges

Intelligent Data Analysis, 2013

Sequential pattern mining is a vital problem with broad applications. However, it is also challenging, as combinatorial high number of intermediate subsequences are generated that have to be critically examined. Most of the basic solutions are based on the assumption that the mining is performed on static database. But modern day databases are being continuously updated and are dynamic in nature. So, incremental mining of sequential patterns has become the norm. This article investigates the need for incremental mining of sequential patterns. An analytical study, focusing on the characteristics, has been made for more than twenty incremental mining algorithms. Further, we have discussed the issues associated with each of them. We infer that the better approach is incremental mining on the progressive database. The three more relevant algorithms, based on this approach, are also studied in depth along with the other work done in this area. This would give scope for future research direction.

A Survey on Different Approaches for Sequential Pattern Mining

In data mining, mining sequential pattern from very huge amount of database is very useful in many applications. Most of sequential pattern mining algorithms work on static data means the database should not change. But the databases in today’s real world application do not have static data, they are incremental databases. New transactions are added at some intervals of time. For updated database, the algorithm needs to be executed again for whole sequence database. So those approaches are not appropriate to use, for that algorithm with incremental approach should be modelled and used. This paper analysis existing approaches for finding sequential pattern mining, and the survey would be helpful in forming a new model or improving some existing approach to handle incremented database & obtain sequential patterns out of them.

Incremental Mining of Sequential Patterns Using Weights

Real life sequential databases are usually not static. They tend to grow incrementally. So after every update a frequent pattern may no longer remains frequent while some infrequent patterns may appear as frequent in updated database. It is not a good idea to mine sequential database from scratch every time as the update occurs. It would be better if we can use the knowledge of already mined sequential patterns to find the complete set of sequential patterns for updated database. An incremental mining algorithm does the same thing. The main goal of an incremental mining algorithm is to reduce the time taken to find out the frequent patterns significantly i.e. it should mine the set of frequent patterns in significantly less time than a non-incremental mining algorithm. In this work we have approached using weight constraints, in time and space, of an idea of already existing algorithm called WSM.

Fast Mining of Finding Frequent Patterns in Transactional Database using Incremental Approach

International Journal of Applied Information Systems, 2015

Datasets grow in size as they are increasingly being gathered by cheap and numerous information-sensing mobile devices, aerial, software logs, microphones, wireless sensor networks and cameras. This paper presents a structure for simply, easily and competently parallelizing data mining algorithms for those huge datasets together with the incremental mining. MapReduce concept is use to execute the parallel FP-Growth algorithm by running the windows services parallel. The proposed algorithm eliminates duplicated work and spurious items. Also, it shortens the response time to a query for the set of frequent items. The proposed algorithm is implemented by parallel running of many windows services and experimental results shows tremendous advantages. The proposed algorithm runs 66% faster than the traditional algorithm of data mining. Also, memory utilization reduces by 37%.

Mining Integrated Sequential Patterns From Multiple Databases

International Journal of Data Warehousing and Mining, 2020

Existing work on multiple databases (MDBs) sequential pattern mining cannot mine frequent sequences to answer exact and historical queries from MDBs having different table structures. This article proposes the transaction id frequent sequence pattern (TidFSeq) algorithm to handle the difficult problem of mining frequent sequences from diverse MDBs. The TidFSeq algorithm transforms candidate 1-sequences to get transaction subsequences where candidate 1-sequences occurred as (1-sequence, itssubsequenceidlist) tuple or (1-sequence, position id list). Subsequent frequent i-sequences are computed using the counts of the sequence ids in each candidate i-sequence position id list tuples. An extended version of the general sequential pattern (GSP)-like candidate generates and a frequency count approach is used for computing supports of itemset (I-step) and separate (S-step) sequences without repeated database scans but with transaction ids. Generated patterns answer complex queries from MDB...

A Novel Approach for Mining Relevant Frequent Patterns in an Incremental Database

IOSR Journal of Computer Engineering, 2013

Frequent pattern mining is emerging as a powerful tool for many business applications such as ecommerce, recommender system and the group decision support system. Many techniques have been developed to mine the frequent patterns. However, it would be the centre of attraction if the degree of importance of each item is taken into consideration.. For this reason, weighted frequent pattern mining algorithms have been suggested. But these algorithms deal only with the static databases, whereas, in reality most databases are interactive and dynamic in nature. Incremental Weighted Frequent Pattern Mining based on Frequency Descending Order (IWFP FD) tree is used to deal with the dynamic nature of the databases while pushing the weight constraints into frequent pattern mining. Branch Sorting Method (BSM) with merge sort is used to restructure the IWFP FD tree. This makes it more convenient for mining the patterns from the tree. It also makes the IWFP FD highly compact to save memory space. This tree allows mining of frequent patterns through a single pass over the database.