Mining Sequential Rules Common to Several Sequences with the Window Size Constraint (original) (raw)

CMRULES: An Efficient Algorithm for Mining Sequential Rules Common to Several Sequences

We propose CMRULES, an algorithm for mining sequential rules common to many sequences in sequence databasesnot for mining rules appearing frequently in sequences. For this reason, the algorithm does not use a sliding-window approach. Instead, it first finds association rules to prune the search space for items that occur jointly in many sequences. Then it eliminates association rules that do not meet minimum confidence and support thresholds according to the time ordering. We evaluated the performance of CMRULES in three different ways. First, we provide an analysis of its time complexity. Second, we compared its performance on a public dataset with a variation of an algorithm from the literature. Results show that CMRULES is more efficient for low support thresholds, and has a better scalability. Lastly, we report a real application of the algorithm in a complex system.

RuleGrowth Mining Sequential Rules Common to Several Sequences by Pattern-Growth

Mining sequential rules from large databases is an important topic in data mining fields with wide applications. Most of the relevant studies focused on finding sequential rules appearing in a single sequence of events and the mining task dealing with multiple sequences were far less explored. In this paper, we present RuleGrowth, a novel algorithm for mining sequential rules common to several sequences. Unlike other algorithms, RuleGrowth uses a pattern-growth approach for discovering sequential rules such that it can be much more efficient and scalable. We present a comparison of RuleGrowth's performance with current algorithms for three public datasets. The experimental results show that RuleGrowth clearly outperforms current algorithms for all three datasets under low support and confidence threshold and has a much better scalability.

Mining Partially-Ordered Sequential Rules Common to Multiple Sequences

IEEE Transactions on Knowledge and Data Engineering, 2015

ABSTRACT Sequential rule mining is an important data mining problem with multiple applications. An important limitation of algorithms for mining sequential rules common to multiple sequences is that rules are very specific and therefore many similar rules may represent the same situation. This can cause three major problems: (1) similar rules can be rated quite differently, (2) rules may not be found because they are individually considered uninteresting, and (3) rules that are too specific are less likely to be used for making predictions. To address these issues, we explore the idea of mining “partially-ordered sequential rules” (POSR), a more general form of sequential rules such that items in the antecedent and the consequent of each rule are unordered. To mine POSR, we propose the RuleGrowth algorithm, which is efficient and easily extendable. In particular, we present an extension (TRuleGrowth) that accepts a sliding-window constraint to find rules occurring within a maximum amount of time. A performance study with four real-life datasets show that RuleGrowth and TRuleGrowth have excellent performance and scalability compared to baseline algorithms and that the number of rules discovered can be several orders of magnitude smaller when the sliding-window constraint is applied. Furthermore, we also report results from a real application showing that POSR can provide a much higher prediction accuracy than regular sequential rules for sequence prediction.

Mining Frequent Sequential Rules with An Efficient Parallel Algorithm

The International Arab Journal of Information Technology, 2022

Sequential rule mining is one of the most common data mining techniques. It intends to find desired rules in large sequence databases. It can decide the essential information that helps acquire knowledge from large search spaces and select curiously rules from sequence databases. The key challenge is to avoid wasting time, which is particularly difficult in large sequence databases. This paper studies the mining rules from two representations of sequential patterns to have compact databases without affecting the final result. In addition, execute a parallel approach by utilizing multi core processor architecture for mining non-redundant sequential rules. Also, perform pruning techniques to enhance the efficiency of the generated rules. The evaluation of the proposed algorithm was accomplished by comparing it with another non-redundant sequential rule algorithm called Non-Redundant with Dynamic Bit Vector (NRD-DBV). Both algorithms were performed on four real datasets with different ...

CMRules: Mining Sequential Rules Common to Several Sequences Knowledge-based Systems

Sequential rule mining is an important data mining task with wide applications. However, current algorithms for discovering sequential rules common to several sequences use very restrictive definitions of sequential rules, which make them unable to recognize that similar rules can describe a same phenomenon. This can have many undesirable effects such as (1) similar rules that are rated differently, (2) rules that are not found because they are considered uninteresting when taken individually, (3) and rules that are too specific, which makes them less likely to be used for making predictions. In this paper, we address these problems by proposing a more general form of sequential rules such that items in the antecedent and in the consequent of each rule are unordered. We propose an algorithm named CMRules, for mining them. The algorithm proceeds by first finding association rules to prune the search space for items that occur jointly in many sequences. Then it eliminates association rules that do not meet the minimum confidence and support thresholds according to the time ordering. We evaluate the performance of CMRules in three different ways.

An Efficient Algorithm for Mining Sequential Rules with Interestingness Measures

2013

Mining sequential rules are an important problem in data mining research. It is commonly used for market decisions, management and behaviour analysis. In traditional association-rule mining, rule interestingness measures such as confidence are used for determining relevant knowledge. They can reduce the size of the search space and select useful or interesting rules from the set of the discovered ones. Many studies have examined the interestingness measures for mining association rules, but have not been devoted to mine sequential rules in sequence databases. In this paper, we thus consider and apply several interestingness measures to generate all relevant sequential rules from a sequence database. The prefix tree structure is also used to get the support values of sequential patterns faster and reduce the execution time for mining sequential rules. Our experimental results show that the run time for mining sequential rules with interestingness measures on the prefix tree structure...

IMSR_PreTree: an improved algorithm for mining sequential rules based on the prefix-tree

Vietnam Journal of Computer Science, 2014

Sequential rules generated from sequential patterns express temporal relationships among patterns. Sequential rule mining is an important research problem because it has broad application such as the analyses of customer purchases, web log, DNA sequences, and so on. However, developing an efficient algorithm for mining sequential rules is a difficult problem due to the large size of the sequential pattern set. The larger the sequential pattern set, the longer the mining time. In this paper, we propose a new algorithm called IMSR_PreTree which is an improved algorithm of MSR_PreTree that mines sequential rules based on prefix-tree. IMSR_PreTree also generates rules from frequent sequences stored in a prefix-tree but it prunes the sub trees which give non-significant rules very early in the process of rule generation and avoids tree scanning as much as possible. Thus, IMSR_PreTree can significantly reduce the search space during the mining process. Our performance study shows that IMSR_PreTree outperforms MSR_PreTree, especially on large sequence databases.

ERMiner: Sequential Rule Mining Using Equivalence Classes

Lecture Notes in Computer Science, 2014

Sequential rule mining is an important data mining task with wide applications. The current state-of-the-art algorithm (RuleGrowth) for this task relies on a pattern-growth approach to discover sequential rules. A drawback of this approach is that it repeatedly performs a costly database projection operation, which deteriorates performance for datasets containing dense or long sequences. In this paper, we address this issue by proposing an algorithm named ERMiner (Equivalence class based sequential Rule Miner) for mining sequential rules. It relies on the novel idea of searching using equivalence classes of rules having the same antecedent or consequent. Furthermore, it includes a data structure named SCM (Sparse Count Matrix) to prune the search space. An extensive experimental study with five real-life datasets shows that ERMiner is up to five times faster than RuleGrowth but consumes more memory.

A Survey on Sequential Rule Mining Techniques

2018

Data mining is also renowned as knowledge discovery in databases, has been recognized as the process of extracting non-trivial, inherent, previously unknown, and potentially useful information from data in databases. The exposed knowledge can be employed in numerous ways in analogous applications. The most vital tasks in data mining are the procedure of determining association rules and frequent item-sets. There is a very vital role of frequent item-sets mining in association rules mining. In the last few years, a range of approaches for uncovering frequent itemsets in especially huge databases have been emerged. Although there have been a large number of algorithms designed for frequent pattern mining, investigating efficient and scalable algorithms is still very challenging. In this paper, we have provided a survey of various sequential rule

Enhanced parallel mining algorithm for frequent sequential rules

Ain Shams Engineering Journal, 2021

Sequential rule mining is an important data mining technique that discovers relationships between occurrences of sequential patterns. The main challenge is to avoid time-consuming, especially in large search spaces. This can be achieved through developing an efficient sequential rule mining algorithm without redundancy in a long sequence dataset. In this paper, an algorithm named PNRD-CloGen is proposed. It can be used to mine sequential rules from both frequent closed dynamic bit vectors with sequential generator patterns at the same time. It helps in speedily eliminating uninteresting candidates and compact the representations. Additionally, we apply a parallel approach utilizing multi-core architecture. An experimental evaluation was performed using five real sequence datasets: BMSWebView1, Sign, FIFA, Korsarak, and MSNBC. The proposed algorithm has been compared with two non-redundant sequential rule algorithms called: TRuleGrowth, and NRD-DBV algorithm. Experimental results show the time saving, especially for large sequence datasets and low minimum support threshold.