Sequential pattern mining -- approaches and algorithms (original) (raw)

A Survey on Different Approaches for Sequential Pattern Mining

In data mining, mining sequential pattern from very huge amount of database is very useful in many applications. Most of sequential pattern mining algorithms work on static data means the database should not change. But the databases in today’s real world application do not have static data, they are incremental databases. New transactions are added at some intervals of time. For updated database, the algorithm needs to be executed again for whole sequence database. So those approaches are not appropriate to use, for that algorithm with incremental approach should be modelled and used. This paper analysis existing approaches for finding sequential pattern mining, and the survey would be helpful in forming a new model or improving some existing approach to handle incremented database & obtain sequential patterns out of them.

Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering, 2004

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [29] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.

Comparative Study of Various Sequential Pattern Mining Algorithms

International Journal of Computer Applications, 2014

In Sequential pattern mining represents an important class of data mining problems with wide range of applications. It is one of the very challenging problems because it deals with the careful scanning of a combinatorially large number of possible subsequence patterns. Broadly sequential pattern ming algorithms can be classified into three types namely Apriori based approaches, Pattern growth algorithms and Early pruning algorithms. These algorithms have further classification and extensions. Detailed explanation of each algorithm along with its important features, pseudo code, advantages and disadvantages is given in the subsequent sections of the paper. At the end a comparative analysis of all the algorithms with their supporting features is given in the form of a table. This paper tries to enrich the knowledge and understanding of various approaches of sequential pattern mining.

A Framework for Mining Closed Sequential Patterns

2014

Sequential pattern mining algorithms developed so far provide better performance for short sequences but are inefficient at mining long sequences, since long sequences generate a large number of frequent subsequences. To efficiently mine long sequences, closed sequential pattern mining algorithms have been developed. These algorithms mine closed sequential patterns which don’t have any super sequences with the same support. Closed sequential patterns are more compact comparing to the patterns produced by the sequential pattern mining algorithms. In this paper, we propose a framework for mining closed sequential patterns by integrating the best features of SPAM and CHARM. Our algorithm is the first method that utilizes vertical bitmap data structure for closed sequential pattern mining. Keywords—Data Mining, Sequential Pattern Mining, Closed Sequential Pattern Mining.

A Study of Sequential Pattern Mining Techniques

A lot of database is available in the world which is to be screened to find out various facts such as history of symptoms from which disease can be diagnosed, accessing data from Knowledge Data Discovery. If we try to filter these data manually, it can take hours or days or months. So we need some interesting tool to access or retrieve the data for which data mining is used to mine that data and to access that mined data we have several techniques and among them sequential pattern mining is better. This paper presents a study of sequential pattern techniques.

IJERT-A Time and Space Efficient Algorithm for Mining Sequential Pattern

International Journal of Engineering Research and Technology (IJERT), 2014

https://www.ijert.org/a-time-and-space-efficient-algorithm-for-mining-sequential-pattern https://www.ijert.org/research/a-time-and-space-efficient-algorithm-for-mining-sequential-pattern-IJERTV3IS091040.pdf Sequential Pattern Mining is very important technique of Data Mining which extract frequent patterns from given sequence. It is used in various fields such as medical treatments, customer shopping sequence, DNA sequence and gene structures. Sequential Pattern Mining Approaches are classified into two categories: Apriori or generate and test approach, pattern growth or divide and conquer approach. In this paper, we are introducing a more time and space efficient algorithm for sequential pattern mining. The time & space consumption of proposed algorithm will be lesser in comparison to previous algorithms & we compare two algorithms of pattern growth algorithms of Sequential Pattern Mining, one is P-prefixspan which discovers frequent sequential pattern with probability of inter arrival time and other one is new proposed algorithm named as Precursive algorithm. Our experiment shows that new proposed algorithm is more efficient and scalable then the P-prefixspan algorithm.

Incremental mining of sequential patterns: Progress and challenges

Intelligent Data Analysis, 2013

Sequential pattern mining is a vital problem with broad applications. However, it is also challenging, as combinatorial high number of intermediate subsequences are generated that have to be critically examined. Most of the basic solutions are based on the assumption that the mining is performed on static database. But modern day databases are being continuously updated and are dynamic in nature. So, incremental mining of sequential patterns has become the norm. This article investigates the need for incremental mining of sequential patterns. An analytical study, focusing on the characteristics, has been made for more than twenty incremental mining algorithms. Further, we have discussed the issues associated with each of them. We infer that the better approach is incremental mining on the progressive database. The three more relevant algorithms, based on this approach, are also studied in depth along with the other work done in this area. This would give scope for future research direction.

PrefixSpan: Mining Sequential Patterns by Prefix-Projected Pattern

Sequential pattern mining discovers frequent subsequences as patterns in a sequence database. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan, we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases technique is developed in PrefixSpan.

SEQUENTIAL DATA MINING: EXPLORING THE REDUNDANT PATTERNS FROM SEQUENCES TO MINIMIZE THE OVERALL PATTERNS

Recent studies in discovering patterns from sequence data have shown the significant impact in many aspects of data mining. In this research, a novel method of finding the redundant pattern is proposed. To efficiently discover the redundant pattern, the focus is on developing new algorithms. Rapid increase of the sequential data has created the problem of discovering meaningful patterns from sequences. The most challenging problem is to find repeating patterns with gap constraints. In this work, we identify a new research for mining the redundant patterns with gap constraints. To solve the problem, we propose algorithm with components such as: (1) Data-driven pattern generation approach to avoid generating unnecessary candidates for validation. (2) Back-tracking pattern search process to discover approximate occurrences of a pattern under user specified gap constraints. (3) An Apriori-like deterministic pruning approach to progressively prune patterns and cease the search process if necessary. It is proposed to conduct experimental analysis on the synthetic and standard data sets. It is also proposed to conduct comparative analysis of the developed algorithms with the state of art algorithms.

Finding Sequential Patterns from Large Sequence Data

2010

Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence database, has attracted a great deal of interest during the recent data mining research because it is the basis of many applications, such as: web user analysis, stock trend prediction, DNA sequence analysis, finding language or linguistic patterns from natural language texts, and using the history of symptoms to predict certain kind of disease. The diversity of the applications may not be possible to apply a single sequential pattern model to all these problems. Each application may require a unique model and solution. A number of research projects were established in recent years to develop meaningful sequential pattern models and efficient algorithms for mining these patterns. In this paper, we theoretically provided a brief overview three types of sequential patterns model.