High Utility Pattern Mining – A Deep Review (original) (raw)

Comparative Study of Different High Utility Pattern Mining Techniques

mining high utility pattern is an important research area in data mining. Several business applications have been found to benefit from the discovery of high utility itemsets and association rules from transaction databases. A comprehensive survey and study of various methods in existence for high utility itemset mining, association rule mining with utility considerations have been presented in this paper. In these, the various mining techniques are used such as Incremental Mining of High Utility Patterns, High Average-Utility Patterns with Multiple Minimum Average-Utility Thresholds, Using Bio-Inspired Algorithms, Algorithm for Incremental and Interactive High Utility Itemset Mining, using Temporal-Based Fuzzy Utility Mining. However there are some issues that need to resolve these are discussed in this paper. Various algorithms are studied for the comparative analysis of high utility pattern mining algorithms and new method is proposed.

IRJET- Study of Algorithms for Mining High Utility Itemsets

IRJET, 2020

Data mining techniques are applied for finding meaningful information and patterns from the large database. The traditional frequent itemset mining (FIM) algorithm generate large number of frequent itemset considering only the occurrence aspect of itemset It does not take into consideration the utility aspect of quantity and profit of item purchased. Hence an extension to FIM, High utility itemsets (HUIs) mining is emerging in information mining, which considers finding all itemsets having a utility meeting a user-specified minimum utility threshold. High-utility itemset mining (HUIM), aims to find a complete set of itemsets having high utilities in a given dataset. High average-utility itemset mining (HAUIM) is a variation of HUIM. HAUIM provides an alternative measurement named the average-utility to discover the itemsets by taking into consideration both of the utility values and lengths of itemsets. Efficient algorithms named TKU (mining Top-K Utility itemset) and TKO (mining Top-K utility itemset in One phase) are used in HUIM. A pattern growth approach is specified for efficiently mining of HAUIs. This paper studies the different algorithms for mining of high utility itemset.

Various Research Opportunities in High Utility Itemset Mining

International Journal of Recent Technology and Engineering

Pattern mining is a technique, which discovers interesting, hidden, unpredicted and useful patterns of data from the database. Most of the research work in pattern mining has been focused on the traditional way of Frequent Itemset Mining (FIM) and Association Rule Mining (ARM) for patterndiscovery. Patterns in frequent itemset mining are based on the occurrence frequency of items. Although frequent pattern mining is useful, the assumption that ‘frequent patterns are interesting,’ doesn’t hold for numerous applications. High Utility Itemset Mining (UIM) overcomes this limitation of frequent itemset mining. The aim of HUIM is to find the patterns based on a utility function where the utility can be measured in terms of revenue, profit, weight, frequency, interestingness or time spent on some webpage, etc. Mining patterns with high utility can be seen as a generalization of FIM where the transaction database is the input and every item is having a utility factor representing its import...

Mining High Utility Itemsets with Regular Occurrence

Journal of ICT Research and Applications, 2016

High utility itemset mining (HUIM) plays an important role in the data mining community and in a wide range of applications. For example, in retail business it is used for finding sets of sold products that give high profit, low cost, etc. These itemsets can help improve marketing strategies, make promotions/ advertisements, etc. However, since HUIM only considers utility values of items/itemsets, it may not be sufficient to observe product-buying behavior of customers such as information related to "regular purchases of sets of products having a high profit margin". To address this issue, the occurrence behavior of itemsets (in the term of regularity) simultaneously with their utility values was investigated. Then, the problem of mining high utility itemsets with regular occurrence (MHUIR) to find sets of co-occurrence items with high utility values and regular occurrence in a database was considered. An efficient single-pass algorithm, called MHUIRA, was introduced. A new modified utility-list structure, called NUL, was designed to efficiently maintain utility values and occurrence information and to increase the efficiency of computing the utility of itemsets. Experimental studies on real and synthetic datasets and complexity analyses are provided to show the efficiency of MHUIRA combined with NUL in terms of time and space usage for mining interesting itemsets based on regularity and utility constraints.

DECISIVE HIGH-UTILITY ITEM SET MINING - AN INNOVATIVE ALGORITHM FOR MINING THE HIGH UTILITY ITEMSETS

The advancement in the field of high-utility item set mining (HUIM) research has emerged as a new trend. This is because of the profit factors concerned with the field. When a high-utility itemset (HUI) is mined, the data obtained and the knowledge derived is humongous. In data mining, when a huge set of data is mined, useful information can be obtained by mining the patterns. This paper focuses on the issues relating to high-utility item set mining (HUIM). This paper proposes a novel and innovative algorithm to alleviate the problems faced in HUIM namely Decisive high-utility item set mining (DHIM) algorithm .The proposed algorithm in this paper performs a scan in the database involved in the transaction and compute the transaction utility of each transaction happening.

A Survey of High Utility Itemset Mining

Studies in Big Data, 2019

High utility pattern mining is an emerging data science task, which consists of discovering patterns having a high importance in databases. The utility of a pattern can be measured in terms of various objective criterias such as its profit, frequency, and weight. Among the various kinds of high utility patterns that can be discovered in databases, high utility itemsets are the most studied. A high utility itemset is a set of values that appears in a database and has a high importance to the user, as measured by a utility function. High utility itemset mining generalizes the problem of frequent itemset mining by considering item quantities and weights. A popular application of high utility itemset mining is to discover all sets of items purchased together by customers that yield a high profit. This chapter provides an introduction to high utility itemset mining, reviews the state-of-the-art algorithms, their extensions, applications, and discusses research opportunities. This chapter is aimed both at those who are new to the field of high utility itemset mining, as well as researchers working in the field.

UP-Growth: An Efficient Algorithm for High Utility Itemset Mining

Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits. Although a number of relevant approaches have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. Such a large number of candidate itemsets degrades the mining performance in terms of execution time and space requirement. The situation may become worse when the database contains lots of long transactions or long high utility itemsets. In this paper, we propose an efficient algorithm, namely UP-Growth (Utility Pattern Growth), for mining high utility itemsets with a set of techniques for pruning candidate itemsets. The information of high utility itemsets is maintained in a special data structure named UP-Tree (Utility Pattern Tree) such that the candidate itemsets can be generated efficiently with only two scans of the database. The performance of UP-Growth was evaluated in comparison with the state-of-the-art algorithms on different types of datasets. The experimental results show that UP-Growth not only reduces the number of candidates effectively but also outperforms other algorithms substantially in terms of execution time, especially when the database contains lots of long transactions.

An Enhancing the Performance of High Utility Itemset Mining using Utility Information Record

2018

Discovering itemsets with high utility like profit from database is known as High Utility Itemset mining. In many real time applications such as retail marketing and Web service the High utility itemsets mining is useful in decisionmaking process. Efficient Mining of High utility itemsets plays a very important role in many real time applications and is a vital research issue in data mining area. The existing high utility mining algorithm degrade the performance takes much time to generate large number of candidate itemsets and to find utility value of all candidate itemsets In this research article, the time and space complexity of UP Growth and UP Growth+ algorithm have been reduced by 1 International Journal of Pure and Applied Mathematics Volume 118 No. 17 2018, 257-272 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue ijpam.eu

Survey on Mining High Utility Patterns in One Phase

International Journal of Engineering Research and, 2017

This Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information-information that can be used to increase revenue, cuts costs, or both. Conventional data mining techniques have focused largely on finding the items that are more frequent in the transaction databases, which is also called frequent itemset mining. These data mining techniques were based on support confidence model. Itemsets which appear more frequently in the database must be of more meaning to the user from the business point of view. High Utility Itemset Mining that discovers the itemsets considering not only the frequency of the itemset but also utility associated with the itemset. Every itemset have a value like quantity, profit and other user's interest. This value associated with every item in a database is called the utility of that itemset. Those itemsets having utility values greater than given threshold are called high utility itemsets. Prior works on this problem all employ a two-phase, candidate generation approach with one exception that is however inefficient and not scalable with large databases. The two-phase approach suffers from scalability issue due to the huge number of candidates. In this paper we present survey on a novel algorithm that finds high utility patterns in a single phase without generating candidates. The novelties lie in a high utility pattern growth approach, a lookahead strategy, and a linear data structure. Concretely, pattern growth approach is to search a reverse set enumeration tree and to prune search space by utility upper bounding. Look ahead strategy is to identify high utility patterns without enumeration by a closure property and a singleton property. The linear data structure is to compute a tight bound for powerful pruning and to directly identify high utility patterns in an efficient and scalable way, which targets the root cause with prior algorithms.

Efficient Mining of High Utility Itemsets from Large Datasets

High utility itemsets mining extends frequent pattern mining to discover itemsets in a transaction database with utility values above a given threshold. However , mining high utility itemsets presents a greater challenge than frequent itemset mining, since high utility itemsets lack the anti-monotone property of frequent itemsets. Transaction Weighted Utility (TWU) proposed recently by researchers has anti-monotone property, but it is an overestimate of itemset utility and therefore leads to a larger search space. We propose an algorithm that uses TWU with pattern growth based on a compact utility pattern tree data structure. Our algorithm implements a parallel projection scheme to use disk storage when the main memory is inadequate for dealing with large datasets. Experimental evaluation shows that our algorithm is more efficient compared to previous algorithms and can mine larger datasets of both dense and sparse data containing long patterns.