Applying the maximum utility measure in high utility sequential pattern mining (original) (raw)

A Survey of High Utility Sequential Pattern Mining

Studies in Big Data, 2019

The problem of mining high utility sequences aims at discovering subsequences having a high utility (importance) in a quantitative sequential database. This problem is a natural generalization of several other pattern mining problems such as discovering frequent itemsets in transaction databases, frequent sequences in sequential databases, and high utility itemsets in quantitative transaction databases. To extract high utility sequences from a quantitative sequential database, the sequential ordering between items and their utility (in terms of criteria such as purchase quantities and unit profits) are considered. High utility sequence mining has been applied in numerous applications. It is much more challenging than the aforementioned problems due to the combinatorial explosion of the search space when considering sequences, and because the utility measure of sequences does not satisfy the downward-closure property used in pattern mining to reduce the search space. This chapter introduces the problem of high utility sequence mining, the state-of-art algorithms, applications, present related problems and research opportunities. A key contribution of the chapter is to also provide a theoretical framework for comparing upper-bounds used by high utility sequence mining algorithms. In particular, an interesting result is that an upper-bound used by the popular USpan algorithm is not an upper-bound. The consequence is that USpan is an incomplete algorithm, and potentially other algorithms extending USpan.

Efficient Mining of High Utility Sequential Pattern from Incremental Sequential Dataset

Frequent Pattern mining is modified by Sequential Pattern Mining to consider time regularity which is further enhanced to high utility sequential pattern mining (HUS) by incorporating utility into sequential pattern mining for business value and impact. In the process of mining HUS, when new sequences are added into the existing database the whole procedure of mining HUS starts from the scratch, in spite of mining HUS only from incremental sequences. This results in excess of time as well as efforts. So in this paper an incremental algorithm is proposed to mine HUS from the Incremental Database. Experimental results show that the proposed algorithm executes faster than existing PHUS algorithm resulting in saving of time as well as efforts.

Survey on Approaches for Sequential Pattern Mining and High Utility Sequential Pattern Mining

International Journal For Scientific Research and Development, 2015

Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behaviour analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. So a recent effort has been to incorporate utility into the sequential pattern selection framework, so that high utility (frequent or infrequent) sequential patterns are mined which address typical business concerns such as dollar value associated with each pattern. So this paper presents detailed different approaches adopted for Sequential pattern mining algorithms as well as high utility sequential pattern mining techniques.

Comprehensive Study of Weighted Sequential Pattern Mining

2013

Extensive growth of data gives the motivation to find meaningful patterns among the huge data. Sequential pattern provides us interesting relationships between different items in sequential database. In the real world, there are several applications in which specific sequences are more important than other sequences. Traditional Sequential pattern approaches are suffering from two disadvantages: Firstly, all the items and sequences are treated uniformly. Second, conventional algorithms are generating large number of patterns for lower support. In addition, the unimportant patterns with low weights can be detected. This paper addresses problem of traditional framework and various framework of weighted sequential pattern. Paper also discuses how algorithm mines sequential pattern which reduces the search space and new pruning technique prune the unimportant pattern and pick only those patterns which leads to important and emerging pattern. Later section of paper discuses results of si...

An Algorithm for Mining High Utility Sequential Patterns with Time Interval

Cybernetics and Information Technologies, 2019

Mining High Utility Sequential Patterns (HUSP) is an emerging topic in data mining which attracts many researchers. The HUSP mining algorithms can extract sequential patterns having high utility (importance) in a quantitative sequence database. In real world applications, the time intervals between elements are also very important. However, recent HUSP mining algorithms cannot extract sequential patterns with time intervals between elements. Thus, in this paper, we propose an algorithm for mining high utility sequential patterns with the time interval problem. We consider not only sequential patterns’ utilities, but also their time intervals. The sequence weight utility value is used to ensure the important downward closure property. Besides that, we use four time constraints for dealing with time interval in the sequence to extract more meaningful patterns. Experimental results show that our proposed method is efficient and effective in mining high utility sequential pattern with t...

Utility Mining across Multi-Sequences with Individualized Thresholds

ACM/IMS Transactions on Data Science

Utility-oriented pattern mining is an emerging topic, since it can reveal high-utility patterns from different types of data, which provides more information than the traditional frequency/confidence-based pattern mining models. The utilities of various items/objects are not exactly equal in realistic situations; each item/object has its own utility or importance. In general, the user considers a uniform minimum utility ( minutil ) threshold to identify the set of high-utility sequential patterns (HUSPs). This is unable to find the interesting patterns while the minutil is set extremely high or low. We first design a new utility mining framework namely USPT for mining high-Utility Sequential Patterns across multi-sequences with individualized Thresholds. Each item in the designed framework has its own specified minimum utility threshold. Based on the lexicographic-sequential tree and the utility-array structure, the USPT framework is presented to efficiently discover the HUSPs. With...

ProUM: Projection-Based Utility Mining on Sequence Data

Information Sciences

Utility is an important concept in economics. A variety of applications consider utility in real-life situations, which has lead to the emergence of utility-oriented mining (also called utility mining) in the recent decade. Utility mining has attracted a great amount of attention, but most of the existing studies have been developed to deal with itemset-based data. Time-ordered sequence data is more commonly seen in real-world situations, which is different from itemset-based data. Since they are time-consuming and require large amount of memory usage, current utility mining algorithms still have limitations when dealing with sequence data. In addition, the mining efficiency of utility mining on sequence data still needs to be improved, especially for long sequences or when there is a low minimum utility threshold. In this paper, we propose an efficient Projection-based Utility Mining (ProUM) approach to discover high-utility sequential patterns from sequence data. The utility-array structure is designed to store the necessary information of the sequence-order and utility. ProUM can significantly improve the mining efficiency by utilizing the projection technique in generating utility-array, and it effectively reduces the memory consumption. Furthermore, a new upper bound named sequence extension utility is proposed and several pruning strategies are further applied to improve the efficiency of ProUM. By taking utility theory into account, the derived high-utility sequential patterns have more insightful and interesting information than other kinds of patterns. Experimental results showed that the proposed ProUM algorithm significantly outperformed the state-of-the-art algorithms in terms of execution time, memory usage, and scalability.

High Utility Pattern Mining – A Deep Review

— The mining high utility pattern is new development in area of data mining. Problem of mining utility pattern with itemset share framework is tricky one as no anti-monotonicity property with interesting measure. Former works on this problem employ a two-phase, candidate generation approach with one exception that is however inefficient and not scalable with large database. This paper reviews former implementation and strategies to mine out high utility pattern in details. We will look ahead some strategies of mining sequential pattern.

On-Shelf Utility Mining of Sequence Data

2022

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS+, to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility (TPEU) and time reduced sequence utility (TRSU). In addition, two novel data structures...

Distributed and parallel high utility sequential pattern mining

2016 IEEE International Conference on Big Data (Big Data), 2016

The problem of mining high utility sequential patterns (HUSP) has been studied recently. Existing solutions are mostly memory-based, which assume that data can fit into the main memory of a computer. However, with advent of big data, such an assumption does not hold any longer. Hence, existing algorithms are not applicable to the big data environments, where data are often distributed and too large to be dealt with by a single machine. In this paper, we propose a new framework for mining HUSPs in big data. A distributed and parallel algorithm called BigHUSP is proposed to discover HUSPs efficiently. At its heart, BigHUSP uses multiple MapReduce-like steps to process data in parallel. We also propose a number of pruning strategies to minimize search space in a distributed environment, and thus decrease computational and communication costs, while still maintaining correctness. Our experiments with real life and large synthetic datasets validate the effectiveness of BigHUSP for mining HUSPs from large sequence datasets.