4 Quality and evaluation (original) (raw)

4.1. Rule quality

An important factor determining performance and comprehensibility of the resulting model is a selection of a rule quality measure. RuleKit provides user with a number of state-of-art measures calculated on the basis of the confusion matrix. Additionally, there is a possibility to define own measures. The confusion matrix consists of the number of positive and negative examples in the entire training set (P and N) and the number of positive and negative examples covered by the rule (p and n). The measures based on the confusion matrix can be used for classification and regression problems (note, that for the former P and N are fixed for each analyzed class, while for the latter P and N are determined for every rule on the basis of covered examples). In the case of survival problems, log-rank statistics is always used for determining rules quality (for simplicity, all examples are assumed positive, thus N and n equal to 0). Below one can find all built-in measures together with formulas.

Quality measure Formula
Accuracy
BinaryEntropy , where the probabilities can be calculated straightforwardly from the confusion matrix
C1
C2
CFoil
CNSignificnce
Coleman
Correlation
Coverage
FBayesianConfirmation
FMeasure
FullCoverage
GeoRSS
GMeasure
JMeasure
Kappa
Klosgen
Laplace
Lift
LogicalSufficiency
MEstimate
MutualSupport
Novelty
OddsRatio
OneWaySupport
PawlakDependencyFactor
Q2
Precision
RelativeRisk
Ripper
RuleInterest
RSS
SBayesian
Sensitivity
Specificity
TwoWaySupport
WeightedLaplace
WeightedRelativeAccuracy
YAILS

4.2. Model characteristics

These indicators are common for all types of problems and their values are established during model construction.

Rule _p_-values are determined during model construction using following tests:

4.3. Performance metrices

Performance metrices are established on the basis of model outcome and real example labels. They are specific for investigated problem.

Classification

In binary classification problems some additional metrices are computed :

Regression

Survival