Sequential Pattern Mining by Pattern-Growth: Principles and Extensions* (original) (raw)

Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering, 2004

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [29] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.

Comparative Study of Various Sequential Pattern Mining Algorithms

International Journal of Computer Applications, 2014

In Sequential pattern mining represents an important class of data mining problems with wide range of applications. It is one of the very challenging problems because it deals with the careful scanning of a combinatorially large number of possible subsequence patterns. Broadly sequential pattern ming algorithms can be classified into three types namely Apriori based approaches, Pattern growth algorithms and Early pruning algorithms. These algorithms have further classification and extensions. Detailed explanation of each algorithm along with its important features, pseudo code, advantages and disadvantages is given in the subsequent sections of the paper. At the end a comparative analysis of all the algorithms with their supporting features is given in the form of a table. This paper tries to enrich the knowledge and understanding of various approaches of sequential pattern mining.

Constraint-based sequential pattern mining: the pattern-growth methods

Journal of Intelligent Information Systems, 2007

Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-pattern mining does not fit our mission well. An extended framework is developed based on a sequential pattern growth methodology. Our study shows that constraints can be effectively and efficiently pushed deep into the sequential pattern mining under this new framework. Moreover, this framework can be extended to constraint-based structured pattern mining as well. Keywords Sequential pattern mining • Frequent pattern mining • Mining with constraints • Pattern-growth methods This research is supported in part by NSERC Grant 312194-05, NSF Grants IIS-0308001, IIS-0513678, BDI-0515813 and National Science Foundation of China (NSFC) grants No. 60303008 and 69933010. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Sequential Mining: Patterns and Algorithms Analysis

This paper presents and analysis the common existing sequential pattern mining algorithms. It presents a classifying study of sequential pattern-mining algorithms into five extensive classes. First, on the basis of Apriori-based algorithm, second on Bredth First Search-based startegy, third on Depth First Serach strategy, fourth on sequential closed-pattern algorithm and five on the basis of incremental pattern minin galgorithms. At the end, a comparative analysis is done on the basis of important key features supported by various algorithms. This study gives an enhancement in the understanding of the approaches of sequential pattern mining.

Sequential pattern mining -- approaches and algorithms

ACM Computing Surveys, 2013

Sequences of events, items or tokens occurring in an ordered metric space appear often in data and the requirement to detect and analyse frequent subsequences is a common problem. Sequential Pattern Mining arose as a sub-field of data mining to focus on this field. This paper surveys the approaches and algorithms proposed to date.

A Genetic Algorithm Based Approach to Closed Sequential Pattern Mining

2015

Introduction Data mining has attracted the information industry and the society in recent years, due to the availability of large amounts of data and the requirement for converting such data into useful information and knowledge. The knowledge obtained can be utilized in different applications such as market analysis, customer retention, finance, insurance, production control and fraud detection. The concept of sequential pattern mining was first introduced by R. Agrawal and R. Srikanth in [1], and aimed at finding sequential patterns in a sequence database, given a userspecified minimum support threshold. There are several applications of sequential pattern mining including mining customer shopping sequences, DNA sequences and Web click streams. Closed sequential pattern mining was introduced to eliminate the drawbacks of sequential pattern mining algorithms. Closed sequential pattern mining produces more compact result set than sequential pattern mining and also offers better effi...

Mining Sequences - Approaches and Analysis

Sequential Pattern Mining is to discover sequential patterns, with user-specified minimum support of pattern where support is number of sequences that contains pattern, from a database of sequences. Each sequence of database consists of list of transactions ordered by transaction time and each transaction is a set of items. Closed Sequential Pattern Mining has same capability as Sequential pattern mining, but in Closed Sequential Pattern Mining redundant patterns to be generated and stored are reduced which is much economical. This paper presents approaches and key-feature of algorithms ClaSP, CM-ClaSP, CloSpan, BIDE which are used for mining closed sequential patterns as well as approaches and key features of algorithms GSP, SPADE, PrefixSpan, SPAM, LAPIN which are used for mining sequential pattern. It shows that number of sequences generated in Closed Sequential Pattern Mining is much less than those generated by Sequential Pattern Mining which makes Closed Sequential Pattern Mining Economical. The algorithms are compared by attributes total time required to find frequent sequences, number of frequent sequences generated and maximum memory required.

Sequential Pattern Mining: A Comparison between GSP, SPADE and Prefix SPAN 1

2015

Abstract- This paper presents a comparison between basically three kinds of algorithm GSP (Generalized Sequential Pattern), SPADE (An efficient Algorithm for mining Frequent Sequences) and Prefix Span (Prefix-projected Sequential Pattern Mining). GSP is the Apriori based Horizontal formatting method, SPADE is the Apriori based vertical formatting method and Prefix-SPAN is Projection-based pattern growth method. This paper elaborate step wise explanation of each algorithm demonstrating number of iterations required in each algorithm. Later a comparison is made between Total time required to execute algorithm, count of frequent sequences found and Max memory (in mb) required by algorithms GSP, SPADE and Prefix-SPAN. The above stated attributes i.e. total time; frequent sequences and Max Memory are obtained using SPMF (A sequential Pattern Mining Framework).

PERFORMANCE COMPARISON BETWEEN PATTERN GROWTH ALGORITHMS FOR MINING SEQUENTIAL PATTERN Prachi Batwara1, Basant Verma, Ph.D2

Sequential Pattern Mining is very important concept in Data Mining, finds frequent patterns from given sequence. It is used in various domains such as medical treatments, customer shopping sequence, DNA sequence and gene structures. Sequential Pattern Mining Approaches are classified into two categories: Apriori or generate and test approach, pattern growth or divide and conquer approach. In this paper, we are introducing a more efficient algorithm for sequential pattern mining. The time & space consumption of proposed algorithm will be lesser in comparison to previous algorithms & we compare two algorithms of pattern growth algorithms of Sequential Pattern Mining, one is P-prefix span which discovers frequent sequential pattern with probability of inter arrival time and other one is new proposed algorithm named as Percussive algorithm. Our experiment shows that new proposed algorithm is more efficient and scalable then the P-prefix span algorithm. Keywords: Data Mining, Sequential Pattern Mining, Frequent Item set, Support count, Sequence database.

A Time & Memory Efficient Method for Extracting Sequential Patterns After Data Reduction from Original Data Set

2016

Tremendous amount of data being collected is increasing speedily by computerized applications round the globe. Concealed in the vast data, the valuable information is attracting researchers of multiple disciplines to study effective approaches to derive useful knowledge from within. Amid different data mining equitable, the mining of frequent patterns has been the focus of knowledge discovery in databases. This paper tends to find the efficient algorithm for mining sequential patterns. Mining sequential patterns with time constraints, like time gaps and sliding time-window, may reinforce the accuracy of mining findings. However, the competence to extract the time-constrained patterns was previously available only within Apriori framework. Modern studies show that pattern-growth methodology could speed up sequence mining. Current algorithms use a generate-candidate-and-test approach that may generate a large amount of candidates for phlegmatic datasets. Many candidates don’t resemble...