An Algorithm for Mining Association Rules with Weighted Minimum Supports (original) (raw)
Related papers
Mining association rules with multiple minimum supports
1999
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most of the previous approaches set a single minimum support threshold for all the items or itemsets. But in real applications, different items may have different criteria to judge its importance. The support requirements should then vary with different items. In this paper, we provide another point of view about defining the minimum supports of itemsets when items have different minimum supports. The maximum constraint is used, which is well explained and may be suitable to some mining domains. We then propose a simple algorithm based on the Apriori approach to find the large-itemsets and association rules under this constraint. The proposed algorithm is easy and efficient when compared to Wang et al.Õs under the maximum constraint. The numbers of association rules and large itemsets obtained by the proposed mining algorithm using the maximum constraint are also less than those using the minimum constraint. Whether to adopt the proposed approach thus depends on the requirements of mining problems. Besides, the granular computing technique of bit strings is used to speed up the proposed data mining algorithm.
Decision Support Systems, 2006
Mining association rules with multiple minimum supports is an important generalization of the association-rule-mining problem, which was recently proposed by Liu et al. Instead of setting a single minimum support threshold for all items, they allow users to specify multiple minimum supports to reflect the natures of the items, and an Apriori-based algorithm, named MSapriori, is developed to mine all frequent itemsets. In this paper, we study the same problem but with two additional improvements. First, we propose a FP-tree-like structure, MIS-tree, to store the crucial information about frequent patterns. Accordingly, an efficient MIS-tree-based algorithm, called the CFP-growth algorithm, is developed for mining all frequent itemsets. Second, since each item can have its own minimum support, it is very difficult for users to set the appropriate thresholds for all items at a time. In practice, users need to tune items' supports and run the mining algorithm repeatedly until a satisfactory end is reached. To speed up this time-consuming tuning process, an efficient algorithm which can maintain the MIS-tree structure without rescanning database is proposed. Experiments on both synthetic and real-life datasets show that our algorithms are much more efficient and scalable than the previous algorithm.
Efficient mining of generalized association rules with non-uniform minimum support
Data & Knowledge Engineering, 2007
Mining generalized association rules between items in the presence of taxonomies has been recognized as an important model in data mining. Earlier work on generalized association rules confined the minimum supports to be uniformly specified for all items or for items within the same taxonomy level. This constraint on minimum support would restrain an expert from discovering some deviations or exceptions that are more interesting but much less supported than general trends. In this paper, we extended the scope of mining generalized association rules in the presence of taxonomies to allow any form of user-specified multiple minimum supports. We discuss the problems of using classic Apriori itemset generation and presented two algorithms, MMS_Cumulate and MMS_Stratify, for discovering the generalized frequent itemsets. Empirical evaluation showed that these two algorithms are very effective and have good linear scale-up characteristics.
An Automated Association Rule Mining Technique With Cumulative Support Thresholds
Int. J. Open Problems in Compt. Math, 2009
Association rule mining is a task in data mining for discovering the hidden, interesting associations between items in the database. To find the relevant associations, the user has to specify support and confidence thresholds. These thresholds play an important role in deciding the number of appropriate rules found. User has many problems in specifying the appropriate thresholds, without the knowledge of itemsets and their frequency in the database. A high support threshold keeps away from generating more number of rules, but at the cost of losing interesting rules of low support. This paper proposes an approach to set suitable support thresholds for frequent itemset generation. Experimental results show that this approach produces the interesting rules without specifying the user specified support threshold.
An improved algorithm for mining association rules using multiple support values
Proc. of FLAIRS Internat. Conf., St. …, 2003
Almost all the approaches in association rule mining suggested the use of a single minimum support, technique that either rules out all infrequent itemsets or suffers from the bottleneck of generating and examining too many candidate large itemsets. In this paper we consider ...
Mining Association Rules from Infrequent Itemsets: A Survey
International Journal of Innovative Research in Science, Engineering and Technology, 2013
Association Rule Mining (AM) is one of the most popular data mining techniques. Association rule mining generates a large number of rules based on support and confidence. However, post analysis is required to obtain interesting rules as many of the generated rules are useless.However, the size of the database can be very large. It is very time consuming to find all the association rules from a large database, and users may be only interested in the associations among some items.So mining association rules in such a way that we maximize the occurrences of useful pattern. In this paper we study several aspects in this direction and analyze the previous research.So that we come with the advantages and disadvantages.
Optimized High-Utility Itemsets Mining for Effective Association Mining Paper
International Journal of Electrical and Computer Engineering (IJECE), 2017
Association rule mining is intently used for determining the frequent itemsets of transactional database; however, it is needed to consider the utility of itemsets in market behavioral applications. Apriori or FP-growth methods generate the association rules without utility factor of items. High-utility itemset mining (HUIM) is a well-known method that effectively determines the itemsets based on high-utility value and the resulting itemsets are known as high-utility itemsets. Fastest high-utility mining method (FHM) is an enhanced version of HUIM. FHM reduces the number of join operations during itemsets generation, so it is faster than HUIM. For large datasets, both methods are very expenisve. Proposed method addressed this issue by building pruning based utility co-occurrence structure (PEUCS) for elimatination of low-profit itemsets, thus, obviously it process only optimal number of high-utility itemsets, so it is called as optimal FHM (OFHM). Experimental results show that OFHM takes less computational runtime, therefore it is more efficient when compared to other existing methods for benchmarked large datasets. 1. INTRODUCTION Association rule mining methods [1] are used for discovering rules and items that are of frequent and user interested items. Existing association mining methods [2-3] use the support-confidence framework [4] in the discovery of user-interested rules. However, this framework is not sufficient for measuring the utility of item sets. In finding the utility of item sets [5], the traditional support-confidence framework is enhanced for measuring the semantic relations among the items which takes the semantic measure of the rule i. e the importance of the item is considered in the rule. Frequent item set mining (FIM) [6] is one of the most important data mining task and it is popular in wide range of real life applications. The FIM discovers frequent itemsets using either Apriori or FP-growth [7] from a given transaction database, so frequently itemsets are appeared in results of transactions. Apriori and FP-growth methods generated the frequent itemsets without considering the profit of itemsets. It is emerging that; we can also consider the importance of frequent itemsets in terms of either a profit or utility. High Utility itemsets refers to a set of frequent items with high utility. High Utility itemsets mining (HUIM) [8] methods are playing a vital role in producing the set of high utility frequent item sets [9]. Association rule mining system is one of the popular methods for discovering of knowledge discovery about finding the relationships among the items. Aim of traditional association rule mining (or Apriori) is to discover the frequent itemsets, which defines the itemsets of each transaction in the transactional database. One of the limitation of this mining system is not concerned the other factors of
Mining association rules with multiple minimum supports using maximum constraints
International Journal of Approximate Reasoning, 2005
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most of the previous approaches set a single minimum support threshold for all the items or itemsets. But in real applications, different items may have different criteria to judge its importance. The support requirements should then vary with different items. In this paper, we provide another point of view about defining the minimum supports of itemsets when items have different minimum supports. The maximum constraint is used, which is well explained and may be suitable to some mining domains. We then propose a simple algorithm based on the Apriori approach to find the large-itemsets and association rules under this constraint. The proposed algorithm is easy and efficient when compared to Wang et al.'s under the maximum constraint. The numbers of association rules and large itemsets obtained by the proposed mining algorithm using the maximum constraint are also less than those using the minimum constraint. Whether to adopt the proposed approach thus depends on the requirements of mining problems. Besides, the granular computing technique of bit strings is used to speed up the proposed data mining algorithm.
Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets
2000
The problem of the relevance and the usefulness of extracted association rules is of primary importance because, in the majority of cases, real-life databases lead to several thousands association rules with high confidence and among which are many redundancies. Using the closure of the Galois connection, we define two new bases for association rules which union is a generating set for all valid association rules with support and confidence. These bases are characterized using frequent closed itemsets and their generators; they consist of the non-redundant exact and approximate association rules having minimal antecedents and maximal consequents, i.e. the most relevant association rules. Algorithms for extracting these bases are presented and results of experiments carried out on real-life databases show that the proposed bases are useful, and that their generation is not time consuming.
An efficient method for mining association rules based on minimum single constraints
Vietnam Journal of Computer Science, 2014
Mining association rules with constraints allow us to concentrate on discovering a useful subset instead of the complete set of association rules. With the aim of satisfying the needs of users and improving the efficiency and effectiveness of mining task, many various constraints and mining algorithms have been proposed. In practice, finding rules regarding specific itemsets is of interest. Thus, this paper considers the problem of mining association rules whose left-hand and right-hand sides contain two given itemsets, respectively. In addition, they also have to satisfy two given maximum support and confidence constraints. Applying previous algorithms to solve this problem may encounter disadvantages, such as the generation of many redundant candidates, time-consuming constraint check and the repeated reading of the database when the constraints are changed. The paper proposes an equivalence relation using the closure of itemset to partition the solution set into disjoint equivalence classes and a new, efficient representation of the rules in each class based on the lattice of closed itemsets and their generators. The paper also develops a new algorithm, called MAR-MINSC, to rapidly mine all constrained rules from the lattice instead of mining them directly from the database. Theoretical results are proven to be reliable. Because MAR-MINSC does not meet drawbacks above, in extensive experiments on many databases it obtains the outstanding performance in comparison with some of existing algorithms in mining association rules with the constraints mentioned.