Utility Mining across Multi-Sequences with Individualized Thresholds (original) (raw)
Related papers
A Survey of High Utility Sequential Pattern Mining
Studies in Big Data, 2019
The problem of mining high utility sequences aims at discovering subsequences having a high utility (importance) in a quantitative sequential database. This problem is a natural generalization of several other pattern mining problems such as discovering frequent itemsets in transaction databases, frequent sequences in sequential databases, and high utility itemsets in quantitative transaction databases. To extract high utility sequences from a quantitative sequential database, the sequential ordering between items and their utility (in terms of criteria such as purchase quantities and unit profits) are considered. High utility sequence mining has been applied in numerous applications. It is much more challenging than the aforementioned problems due to the combinatorial explosion of the search space when considering sequences, and because the utility measure of sequences does not satisfy the downward-closure property used in pattern mining to reduce the search space. This chapter introduces the problem of high utility sequence mining, the state-of-art algorithms, applications, present related problems and research opportunities. A key contribution of the chapter is to also provide a theoretical framework for comparing upper-bounds used by high utility sequence mining algorithms. In particular, an interesting result is that an upper-bound used by the popular USpan algorithm is not an upper-bound. The consequence is that USpan is an incomplete algorithm, and potentially other algorithms extending USpan.
Survey on Approaches for Sequential Pattern Mining and High Utility Sequential Pattern Mining
International Journal For Scientific Research and Development, 2015
Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behaviour analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. So a recent effort has been to incorporate utility into the sequential pattern selection framework, so that high utility (frequent or infrequent) sequential patterns are mined which address typical business concerns such as dollar value associated with each pattern. So this paper presents detailed different approaches adopted for Sequential pattern mining algorithms as well as high utility sequential pattern mining techniques.
An Algorithm for Mining High Utility Sequential Patterns with Time Interval
Cybernetics and Information Technologies, 2019
Mining High Utility Sequential Patterns (HUSP) is an emerging topic in data mining which attracts many researchers. The HUSP mining algorithms can extract sequential patterns having high utility (importance) in a quantitative sequence database. In real world applications, the time intervals between elements are also very important. However, recent HUSP mining algorithms cannot extract sequential patterns with time intervals between elements. Thus, in this paper, we propose an algorithm for mining high utility sequential patterns with the time interval problem. We consider not only sequential patterns’ utilities, but also their time intervals. The sequence weight utility value is used to ensure the important downward closure property. Besides that, we use four time constraints for dealing with time interval in the sequence to extract more meaningful patterns. Experimental results show that our proposed method is efficient and effective in mining high utility sequential pattern with t...
Applying the maximum utility measure in high utility sequential pattern mining
Expert Systems with Applications, 2014
Recently, high utility sequential pattern mining has been an emerging popular issue due to the consideration of quantities, profits and time orders of items. The utilities of subsequences in sequences in the existing approach are difficult to be calculated due to the three kinds of utility calculations. To simplify the utility calculation, this work then presents a maximum utility measure, which is derived from the principle of traditional sequential pattern mining that the count of a subsequence in the sequence is only regarded as one. Hence, the maximum measure is properly used to simplify the utility calculation for subsequences in mining. Meanwhile, an effective upper-bound model is designed to avoid information losing in mining, and also an effective projection-based pruning strategy is designed as well to cause more accurate sequence-utility upper-bounds of subsequences. The indexing strategy is also developed to quickly find the relevant sequences for prefixes in mining, and thus unnecessary search time can be reduced. Finally, the experimental results on several datasets show the proposed approach has good performance in both pruning effectiveness and execution efficiency.
Efficient Mining of High Utility Sequential Pattern from Incremental Sequential Dataset
Frequent Pattern mining is modified by Sequential Pattern Mining to consider time regularity which is further enhanced to high utility sequential pattern mining (HUS) by incorporating utility into sequential pattern mining for business value and impact. In the process of mining HUS, when new sequences are added into the existing database the whole procedure of mining HUS starts from the scratch, in spite of mining HUS only from incremental sequences. This results in excess of time as well as efforts. So in this paper an incremental algorithm is proposed to mine HUS from the Incremental Database. Experimental results show that the proposed algorithm executes faster than existing PHUS algorithm resulting in saving of time as well as efforts.
Discovering High Utility Episodes in Sequences
ArXiv, 2019
Sequence data, e.g., complex event sequence, is more commonly seen than other types of data (e.g., transaction data) in real-world applications. For the mining task from sequence data, several problems have been formulated, such as sequential pattern mining, episode mining, and sequential rule mining. As one of the fundamental problems, episode mining has often been studied. The common wisdom is that discovering frequent episodes is not useful enough. In this paper, we propose an efficient utility mining approach namely UMEpi: Utility Mining of high-utility Episodes from complex event sequence. We propose the concept of remaining utility of episode, and achieve a tighter upper bound, namely episode-weighted utilization (EWU), which will provide better pruning. Thus, the optimized EWU-based pruning strategies can achieve better improvements in mining efficiency. The search space of UMEpi w.r.t. a prefix-based lexicographic sequence tree is spanned and determined recursively for minin...
Efficiently mining high utility sequential patterns in static and streaming data
Intelligent Data Analysis, 2017
High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs in static databases and do not take streaming data into account, where unbounded data come continuously and often at a high speed. To efficiently deal with both problems, we propose a novel framework for mining high utility sequential patterns over static and streaming databases. In this regard, two efficient data structures named ItemUtilLists (Item Utility Lists) and HUSP-Tree (High Utility Sequential Pattern Tree) are proposed to maintain essential information for mining HUSPs in both offline and online fashions. In addition, a novel utility model called Sequence-Suffix Utility is proposed for effectively pruning the search space in HUSP mining. We propose an algorithm named HUSP-Miner (High Utility Sequential Pattern Miner) to find HUSPs in static databases efficiently. Then, a one-pass algorithm named HUSP-Stream (High Utility Sequential Pattern mining over Data Streams) is proposed to incrementally update ItemUtilLists and HUSP-Tree online and find HUSPs over data streams. To the best of our knowledge, HUSP-Stream is the first method to find HUSPs over data streams. Experimental results on both real and synthetic datasets show that HUSP-Miner outperforms the compared algorithms substantially in terms of execution time, memory usage and number of generated candidates. The experiments also demonstrate impressive performance of HUSP-Stream to update the data structures and discover HUSPs over data streams.
International Journal of Engineering Research and Technology (IJERT), 2021
https://www.ijert.org/mining-high-utility-sequential-pattern-using-lexicographic-q-sequence-tree-and-utility-linked-list https://www.ijert.org/research/mining-high-utility-sequential-pattern-using-lexicographic-q-sequence-tree-and-utility-linked-list-IJERTCONV9IS07003.pdf High utility sequential pattern (HUSP) mining is an important field to discover high utility patterns in a sequence. Nowadays it becomes more relevant and important in real life applications like market basket analysis, e-commerce recommendations and bio informatics etc. Sequential Pattern Mining (SPM) is used to mine or extract sequential or frequent patterns from vast database. In traditional SPM certain factors like utility of products, profit are not considered. To improve its features, the process of SPM is generalized to HUSP Mining (HUSPM) which is used to discover the high utility patterns in a sequence database. Many algorithms have been proposed to find the high utility of a sequence database, but due to the large search space, the combinatorial explosion has been raised. This paper proposes a new algorithm, for mining HUSP-Utility Linked List (ULL). The objective of HUSP-ULL is to discover the sequential pattern and to find the utility of each pattern in the database, that meets or exceeds predefined minimum utility threshold. HUSPM make use of lexicographic q-sequence and UL (Utility Linked)-list for identifying high utility patterns. The obtained output can be used in many applications like ecommerce, market basket analysis, healthcare industry, web mining, bioinformatics and mobile computing etc.
Fast Utility Mining on Complex Sequences
2019
High-utility sequential pattern mining is an emerging topic in the field of Knowledge Discovery in Databases. It consists of discovering subsequences having a high utility (importance) in sequences, referred to as high-utility sequential patterns (HUSPs). HUSPs can be applied to many real-life applications, such as market basket analysis, E-commerce recommendation, click-stream analysis and scenic route planning. For example, in economics and targeted marketing, understanding economic behavior of consumers is quite challenging, such as finding credible and reliable information on product profitability. Several algorithms have been proposed to address this problem by efficiently mining utility-based useful sequential patterns. Nevertheless, the performance of these algorithms can be unsatisfying in terms of runtime and memory usage due to the combinatorial explosion of the search space for low utility threshold and large databases. Hence, this paper proposes a more efficient algorith...
High Utility Pattern Mining – A Deep Review
— The mining high utility pattern is new development in area of data mining. Problem of mining utility pattern with itemset share framework is tricky one as no anti-monotonicity property with interesting measure. Former works on this problem employ a two-phase, candidate generation approach with one exception that is however inefficient and not scalable with large database. This paper reviews former implementation and strategies to mine out high utility pattern in details. We will look ahead some strategies of mining sequential pattern.