An empirical quest for optimal rule learning heuristics (original) (raw)

On the quest for optimal rule learning heuristics

2010

The primary goal of the research reported in this paper is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy top-down covering algorithm. We first argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this trade-off by determining optimal parameter settings for five different parametrized heuristics. In order to avoid biasing our study by known functional families, we also investigate the potential of using metalearning for obtaining alternative rule learning heuristics. The key results of this experimental study are not only practical default values for commonly used heuristics and a broad comparative evaluation of known and novel rule learning heuristics, but we also gain theoretical insights into factors that are responsible for a good performance. For example, we observe that consistency should be weighted more heavily than coverage, presumably because a lack of coverage can later be corrected by learning additional rules.

An Empirical Investigation of the Trade-Off between Consistency and Coverage in Rule Learning Heuristics

Lecture Notes in Computer Science, 2008

In this paper, we argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this trade-off by determining optimal parameter settings for five different parametrized heuristics. This empirical comparison yields several interesting results. Of considerable practical importance are the default values that we establish for these heuristics, and for which we show that they outperform commonly used instantiations of these heuristics. We also gain some theoretical insights. For example, we note that it is important to relate the rule coverage to the class distribution, but that the true positive rate should be weighted more heavily than the false positive rate. We also find that the optimal parameter settings of these heuristics effectively implement quite similar preference criteria.

A heuristic covering algorithm has higher predictive accuracy than learning all rules

1996

The induction of classification rules has been dominated by a single generic technique—the covering algorithm. This approach employs a simple hill-climbing search to learn sets of rules. Such search is subject to numerous widely known deficiencies. Further, there is a growing body of evidence that learning redundant sets of rules can improve predictive accuracy. The ultimate end-point of a move toward learning redun- dant rule sets would appear to be to learn and employ all possible rules. This paper presents a learning system that does this. An empirical investigation shows that, while the approach often achieves higher predictive accuracy than a covering algorithm, the covering algorithm outperforms induction of all rules significantly more frequently. Pre- liminary analysis suggests that learning all rules performs well when the training set clearly defines the decision surfaces but that the heuristic covering algorithm performs better when the decision surfaces are not clearly d...

On trading off consistency and coverage in inductive rule learning

2006

Evaluation metrics for rule learning typically, in one way or another, trade off consistency and coverage. In this work, we investigate this tradeoff for three different families of rule learning heuristics, all of them featuring a parameter that implements this trade-off in different guises. These heuristics are the m-estimate, the F -measure, and the Klösgen measures. The main goals of this work are to extend our understanding of these heuristics by visualizing their behavior via isometrics in coverage space, and to determine optimal parameter settings for them. Interestingly, even though the heuristics use quite different ways for implementing this trade-off, their optimal settings realize quite similar evaluation functions. Our empirical results on a large number of datasets demonstrate that, even though we do not use any form of pruning, the quality of the rules learned with these settings outperforms standard rule learning heuristics and approaches the performance of Ripper, a state-of-the-art rule learning system that uses extensive pruning and optimization phases.

An Empirical Comparison of Hill-Climbing and Exhaustive Search in Inductive Rule Learning

Most commonly used inductive rule learning algorithms employ a hill-climbing search, whereas local pattern discovery algorithms employ exhaustive search. In this paper, we evaluate the spectrum of dierent search strategies to see whether separate-and-conquer rule learning algorithms are able to gain performance in terms of predictive accuracy or theory size by using more powerful search strategies like beam search or exhaustive search. Unlike previous results that demonstrated that rule learning algorithm suer from over- searching, our work pays particular attention to the connection between the search heuristic and the search strategy, and we show that for some rule evaluation functions, complex search algorithms will consistently improve results without suering from the over-searching phenomenon. In particular, we will see that this is typically the case for heuristics which perform bad in a hill-climbing search. We interpret this as evidence that commonly used rule learning heuri...

Rule induction with a genetic sequential covering algorithm (geseco)

2000

Lists of if-then rules (i.e. ordered rule sets) are among the most expressive and intelligible representations for inductive learning algorithms. Two extreme strategies searching for such list of rules can be distinguished (i) local strategies primarily based on a step by step search for the optimal list of rules, and (ii) global strategies primarily based on a one strike search for the optimal list of rules. Both approaches have their disadvantages. In this paper we present a intermediate strategy. A sequential covering strategy is combined with a one-strike genetic search for the most promising next rule. To achieve this, a new rule-tness function is introduced. Experimental results are reported in which the learning results of our intermediate approach are compared to other rule learning algorithms.

A Hyper-Heuristic for Descriptive Rule Induction

International Journal of Data Warehousing and Mining, 2007

Rule induction from examples is a machine learning technique that finds rules of the form condition → class, where condition and class are logic expressions of the form variable 1 = value 1 ∧ variable 2 = value 2 ∧… ∧ variable k = value k . There are in general three approaches to rule induction: exhaustive search, divide-and-conquer, and separate-and-conquer (or its extension as weighted covering). Among them, the third approach, according to different rule search heuristics, can avoid the problem of producing many redundant rules (limitation of the first approach) or nonoverlapping rules (limitation of the second approach).

Classification learning using all rules

Lecture Notes in Computer Science, 1998

The covering algorithm has been ubiquitous in the induction of classification rules. This approach to machine learning uses heuristic search that seeks to find a minimum number of rules that adequately explain the data. However, recent research has provided evidence that learning redundant classifiers can increase predictive accuracy. Learning all possible classifiers seems to be a plausible ultimate form of this notion of redundant classifiers. This paper presents an algorithm that in effect learns all classifiers. Preliminary investigation by Webb (1996b) suggested that a heuristic covering algorithm in general learns classification rules with higher predictive accuracy than those learned by this new approach. In this paper we present an extensive empirical comparison between the learning-all-rules algorithm and three varied established approaches to inductive learning, namely, a covering algorithm, an instance-based learner and a decision tree learner. Empirical evaluation provides strong evidence in support of learning-all-rules as a plausible approach to inductive learning.