A Survey of High Utility Sequential Pattern Mining (original) (raw)
Related papers
Applying the maximum utility measure in high utility sequential pattern mining
Expert Systems with Applications, 2014
Recently, high utility sequential pattern mining has been an emerging popular issue due to the consideration of quantities, profits and time orders of items. The utilities of subsequences in sequences in the existing approach are difficult to be calculated due to the three kinds of utility calculations. To simplify the utility calculation, this work then presents a maximum utility measure, which is derived from the principle of traditional sequential pattern mining that the count of a subsequence in the sequence is only regarded as one. Hence, the maximum measure is properly used to simplify the utility calculation for subsequences in mining. Meanwhile, an effective upper-bound model is designed to avoid information losing in mining, and also an effective projection-based pruning strategy is designed as well to cause more accurate sequence-utility upper-bounds of subsequences. The indexing strategy is also developed to quickly find the relevant sequences for prefixes in mining, and thus unnecessary search time can be reduced. Finally, the experimental results on several datasets show the proposed approach has good performance in both pruning effectiveness and execution efficiency.
An Algorithm for Mining High Utility Sequential Patterns with Time Interval
Cybernetics and Information Technologies, 2019
Mining High Utility Sequential Patterns (HUSP) is an emerging topic in data mining which attracts many researchers. The HUSP mining algorithms can extract sequential patterns having high utility (importance) in a quantitative sequence database. In real world applications, the time intervals between elements are also very important. However, recent HUSP mining algorithms cannot extract sequential patterns with time intervals between elements. Thus, in this paper, we propose an algorithm for mining high utility sequential patterns with the time interval problem. We consider not only sequential patterns’ utilities, but also their time intervals. The sequence weight utility value is used to ensure the important downward closure property. Besides that, we use four time constraints for dealing with time interval in the sequence to extract more meaningful patterns. Experimental results show that our proposed method is efficient and effective in mining high utility sequential pattern with t...
Efficient Mining of High Utility Sequential Pattern from Incremental Sequential Dataset
Frequent Pattern mining is modified by Sequential Pattern Mining to consider time regularity which is further enhanced to high utility sequential pattern mining (HUS) by incorporating utility into sequential pattern mining for business value and impact. In the process of mining HUS, when new sequences are added into the existing database the whole procedure of mining HUS starts from the scratch, in spite of mining HUS only from incremental sequences. This results in excess of time as well as efforts. So in this paper an incremental algorithm is proposed to mine HUS from the Incremental Database. Experimental results show that the proposed algorithm executes faster than existing PHUS algorithm resulting in saving of time as well as efforts.
Fast Utility Mining on Complex Sequences
2019
High-utility sequential pattern mining is an emerging topic in the field of Knowledge Discovery in Databases. It consists of discovering subsequences having a high utility (importance) in sequences, referred to as high-utility sequential patterns (HUSPs). HUSPs can be applied to many real-life applications, such as market basket analysis, E-commerce recommendation, click-stream analysis and scenic route planning. For example, in economics and targeted marketing, understanding economic behavior of consumers is quite challenging, such as finding credible and reliable information on product profitability. Several algorithms have been proposed to address this problem by efficiently mining utility-based useful sequential patterns. Nevertheless, the performance of these algorithms can be unsatisfying in terms of runtime and memory usage due to the combinatorial explosion of the search space for low utility threshold and large databases. Hence, this paper proposes a more efficient algorith...
Survey on Approaches for Sequential Pattern Mining and High Utility Sequential Pattern Mining
International Journal For Scientific Research and Development, 2015
Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behaviour analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. So a recent effort has been to incorporate utility into the sequential pattern selection framework, so that high utility (frequent or infrequent) sequential patterns are mined which address typical business concerns such as dollar value associated with each pattern. So this paper presents detailed different approaches adopted for Sequential pattern mining algorithms as well as high utility sequential pattern mining techniques.
Discovering High Utility Episodes in Sequences
ArXiv, 2019
Sequence data, e.g., complex event sequence, is more commonly seen than other types of data (e.g., transaction data) in real-world applications. For the mining task from sequence data, several problems have been formulated, such as sequential pattern mining, episode mining, and sequential rule mining. As one of the fundamental problems, episode mining has often been studied. The common wisdom is that discovering frequent episodes is not useful enough. In this paper, we propose an efficient utility mining approach namely UMEpi: Utility Mining of high-utility Episodes from complex event sequence. We propose the concept of remaining utility of episode, and achieve a tighter upper bound, namely episode-weighted utilization (EWU), which will provide better pruning. Thus, the optimized EWU-based pruning strategies can achieve better improvements in mining efficiency. The search space of UMEpi w.r.t. a prefix-based lexicographic sequence tree is spanned and determined recursively for minin...
Utility Mining across Multi-Sequences with Individualized Thresholds
ACM/IMS Transactions on Data Science
Utility-oriented pattern mining is an emerging topic, since it can reveal high-utility patterns from different types of data, which provides more information than the traditional frequency/confidence-based pattern mining models. The utilities of various items/objects are not exactly equal in realistic situations; each item/object has its own utility or importance. In general, the user considers a uniform minimum utility ( minutil ) threshold to identify the set of high-utility sequential patterns (HUSPs). This is unable to find the interesting patterns while the minutil is set extremely high or low. We first design a new utility mining framework namely USPT for mining high-Utility Sequential Patterns across multi-sequences with individualized Thresholds. Each item in the designed framework has its own specified minimum utility threshold. Based on the lexicographic-sequential tree and the utility-array structure, the USPT framework is presented to efficiently discover the HUSPs. With...
Comprehensive Study of Weighted Sequential Pattern Mining
2013
Extensive growth of data gives the motivation to find meaningful patterns among the huge data. Sequential pattern provides us interesting relationships between different items in sequential database. In the real world, there are several applications in which specific sequences are more important than other sequences. Traditional Sequential pattern approaches are suffering from two disadvantages: Firstly, all the items and sequences are treated uniformly. Second, conventional algorithms are generating large number of patterns for lower support. In addition, the unimportant patterns with low weights can be detected. This paper addresses problem of traditional framework and various framework of weighted sequential pattern. Paper also discuses how algorithm mines sequential pattern which reduces the search space and new pruning technique prune the unimportant pattern and pick only those patterns which leads to important and emerging pattern. Later section of paper discuses results of si...
FMaxCloHUSM: An efficient algorithm for mining frequent closed and maximal high utility sequences
Engineering Applications of Artificial Intelligence, 2019
Mining all frequent high utility sequences (FHUS) in quantitative sequential databases (QSDBs) is a generalization of the problem of mining all frequent sequences in non-quantitative sequence databases. In the last decade, the former problem has attracted the attention of many researchers because utility-based sequences are more informative and actionable for decision-making than frequent sequences. Although utilitybased sequences have many real-life applications, their number is often very large, especially for low minimum utility thresholds and long sequences. It can thus be difficult for users to analyze them and mining utility-based sequences often requires much time and memory. To solve this problem, this paper proposes two concise representations of FHUS, having a small cardinality that provides a concise summary of all FHUS. Those representations are defined as two sets, FCHUS and FM HUS, of all frequent maximal and closed high utility sequences. To efficiently mine these concise representations, two width and depth pruning strategies are proposed for eliminating low utility sequences early and a novel local pruning strategy is proposed named LPCHUS using a new extended measure on projected databases for eliminating non-closed and non-maximal high utility sequences early as well as their extensions. Based on these strategies and a novel data structure named SIDUL in vertical format, an algorithm named FMaxCloHUSM is designed for efficiently mining the sets of FCHUS and FM HUS, separately or simultaneously. To our best knowledge, this is the first algorithm for discovering these two concise representations. An experimental study conducted using both real-life and synthetic QSDBs shows that the proposed algorithm is efficient in terms of time and memory consumption, and that the developed strategies greatly reduce the search space.
Efficient Mining of Sequential Patterns in a Sequence Database with Weight Constraint
International Journal on Recent and Innovation Trends in Computing and Communication, 2016
Sequence pattern mining is one of the essential data mining tasks with broad applications. Many sequence mining algorithms have been developed to find a set of frequent sub-sequences satisfying the support threshold in a sequence database. The main problem in most of these algorithms is they generate huge number of sequential patterns when the support threshold is low and all the sequence patterns are treated uniformly while real sequential patterns have different importance. In this paper, we propose an algorithm which aims to find more interesting sequential patterns, considering the different significance of each data element in a sequence database. Unlike the conventional weighted sequential pattern mining, where the weights of items are preassigned according to the priority or importance, in our approach the weights are set according to the real data and during the mining process not only the supports but also weights of patterns are considered. The experimental results show that the algorithm is efficient and effective in generating more interesting patterns.