Constraint-Based Sequential Pattern Mining: A Pattern Growth Algorithm Incorporating Compactness, Length and Monetary (original) (raw)

Efficient constraint-based Sequential Pattern Mining (SPM) algorithm to understand customers’ buying behaviour from time stamp-based sequence dataset

Cogent engineering, 2015

Business Strategies are formulated based on an understanding of customer needs. This requires development of a strategy to understand customer behaviour and buying patterns, both current and future. This involves understanding, first how an organization currently understands customer needs and second predicting future trends to drive growth. This article focuses on purchase trend of customer, where timing of purchase is more important than association of item to be purchased, and which can be found out with Sequential Pattern Mining (SPM) methods. Conventional SPM algorithms worked purely on frequency identifying patterns that were more frequent but suffering from challenges like generation of huge number of uninteresting patterns, lack of user's interested patterns, rare item problem, etc. Article attempts a solution through development of a SPM algorithm based on various constraints like Gap, Compactness, Item, Recency, Profitability and Length along with Frequency constraint. Incorporation of six additional constraints is as well to ensure that all patterns are recently active (Recency), active for certain time span (Compactness), profitable and indicative of next timeline for purchase (Length-Item-Gap). The article also attempts to throw light on how proposed

Target Oriented Sequential Pattern Mining using Recency and Monetary Constraints

International Journal of Computer Applications, 2012

Many approaches in constraint based sequential pattern mining have been proposed and most of them focus only on the concept of frequency, which means, if a pattern is not frequent, it is removed from further consideration. Frequency is a good indicator of the importance of a pattern but in real life, however, the environment may change constantly and patterns discovered from database may also change over time. Therefore, the users' recent behavior is not necessarily the same as the past ones and a pattern that occurs frequently in the past may never happen again in the future. So in this paper we have considered recency constraint to overcome this problem. Also we have considered one more constraint, monetary constraint since for making effective marketing strategies it is important to know the value of customer on the basis of what they are purchasing periodically and how much they are spending. So this motivates to consider monetary value of customers for targeting profitable customers. Along with that we have included the concept of mining only target oriented sequential patterns which satisfy RFM constraints to find the happening order of a concerned itemsets only, for taking effective marketing decisions.

A Survey on Different Approaches for Sequential Pattern Mining

In data mining, mining sequential pattern from very huge amount of database is very useful in many applications. Most of sequential pattern mining algorithms work on static data means the database should not change. But the databases in today’s real world application do not have static data, they are incremental databases. New transactions are added at some intervals of time. For updated database, the algorithm needs to be executed again for whole sequence database. So those approaches are not appropriate to use, for that algorithm with incremental approach should be modelled and used. This paper analysis existing approaches for finding sequential pattern mining, and the survey would be helpful in forming a new model or improving some existing approach to handle incremented database & obtain sequential patterns out of them.

Survey on Approaches for Sequential Pattern Mining and High Utility Sequential Pattern Mining

International Journal For Scientific Research and Development, 2015

Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behaviour analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. So a recent effort has been to incorporate utility into the sequential pattern selection framework, so that high utility (frequent or infrequent) sequential patterns are mined which address typical business concerns such as dollar value associated with each pattern. So this paper presents detailed different approaches adopted for Sequential pattern mining algorithms as well as high utility sequential pattern mining techniques.

Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering, 2004

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [29] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.

Constraint-based sequential pattern mining: the pattern-growth methods

Journal of Intelligent Information Systems, 2007

Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-pattern mining does not fit our mission well. An extended framework is developed based on a sequential pattern growth methodology. Our study shows that constraints can be effectively and efficiently pushed deep into the sequential pattern mining under this new framework. Moreover, this framework can be extended to constraint-based structured pattern mining as well. Keywords Sequential pattern mining • Frequent pattern mining • Mining with constraints • Pattern-growth methods This research is supported in part by NSERC Grant 312194-05, NSF Grants IIS-0308001, IIS-0513678, BDI-0515813 and National Science Foundation of China (NSFC) grants No. 60303008 and 69933010. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

A pattern growth-based sequential pattern mining algorithm called prefixSuffixSpan

ICST Transactions on Scalable Information Systems, 2017

Sequential pattern mining is an important data mining problem widely addressed by the data mining community, with a very large field of applications. The sequence pattern mining aims at extracting a set of attributes, shared across time among a large number of objects in a given database. The work presented in this paper is directed towards the general theoretical foundations of the pattern-growth approach. It helps indepth understanding of the pattern-growth approach, current status of provided solutions, and direction of research in this area. In this paper, this study is carried out on a particular class of pattern-growth algorithms for which patterns are grown by making grow either the current pattern prefix or the current pattern suffix from the same position at each growth-step. This study leads to a new algorithm called prefixSuffixSpan. Its correctness is proven and experimentations are performed.

Comprehensive Study of Weighted Sequential Pattern Mining

2013

Extensive growth of data gives the motivation to find meaningful patterns among the huge data. Sequential pattern provides us interesting relationships between different items in sequential database. In the real world, there are several applications in which specific sequences are more important than other sequences. Traditional Sequential pattern approaches are suffering from two disadvantages: Firstly, all the items and sequences are treated uniformly. Second, conventional algorithms are generating large number of patterns for lower support. In addition, the unimportant patterns with low weights can be detected. This paper addresses problem of traditional framework and various framework of weighted sequential pattern. Paper also discuses how algorithm mines sequential pattern which reduces the search space and new pruning technique prune the unimportant pattern and pick only those patterns which leads to important and emerging pattern. Later section of paper discuses results of si...

A Survey of High Utility Sequential Pattern Mining

Studies in Big Data, 2019

The problem of mining high utility sequences aims at discovering subsequences having a high utility (importance) in a quantitative sequential database. This problem is a natural generalization of several other pattern mining problems such as discovering frequent itemsets in transaction databases, frequent sequences in sequential databases, and high utility itemsets in quantitative transaction databases. To extract high utility sequences from a quantitative sequential database, the sequential ordering between items and their utility (in terms of criteria such as purchase quantities and unit profits) are considered. High utility sequence mining has been applied in numerous applications. It is much more challenging than the aforementioned problems due to the combinatorial explosion of the search space when considering sequences, and because the utility measure of sequences does not satisfy the downward-closure property used in pattern mining to reduce the search space. This chapter introduces the problem of high utility sequence mining, the state-of-art algorithms, applications, present related problems and research opportunities. A key contribution of the chapter is to also provide a theoretical framework for comparing upper-bounds used by high utility sequence mining algorithms. In particular, an interesting result is that an upper-bound used by the popular USpan algorithm is not an upper-bound. The consequence is that USpan is an incomplete algorithm, and potentially other algorithms extending USpan.

Comparative Study of Various Sequential Pattern Mining Algorithms

International Journal of Computer Applications, 2014

In Sequential pattern mining represents an important class of data mining problems with wide range of applications. It is one of the very challenging problems because it deals with the careful scanning of a combinatorially large number of possible subsequence patterns. Broadly sequential pattern ming algorithms can be classified into three types namely Apriori based approaches, Pattern growth algorithms and Early pruning algorithms. These algorithms have further classification and extensions. Detailed explanation of each algorithm along with its important features, pseudo code, advantages and disadvantages is given in the subsequent sections of the paper. At the end a comparative analysis of all the algorithms with their supporting features is given in the form of a table. This paper tries to enrich the knowledge and understanding of various approaches of sequential pattern mining.