CheckKernel (original) (raw)
Class for examining the capabilities and finding problems with kernels. If you implement an kernels using the WEKA.libraries, you should run the checks on it to ensure robustness and correct operation. Passing all the tests of this object does not mean bugs in the kernels don't exist, but this will help find some common ones.
Typical usage:
java weka.classifiers.functions.supportVector.CheckKernel -W kernel_name -- kernel_options
CheckKernel reports on the following:
- Kernel abilities
- Possible command line options to the kernels
- Whether the kernels can predict nominal, numeric, string, date or relational class attributes.
- Whether the kernels can handle numeric predictor attributes
- Whether the kernels can handle nominal predictor attributes
- Whether the kernels can handle string predictor attributes
- Whether the kernels can handle date predictor attributes
- Whether the kernels can handle relational predictor attributes
- Whether the kernels can handle multi-instance data
- Whether the kernels can handle missing predictor values
- Whether the kernels can handle missing class values
- Whether a nominal kernels only handles 2 class problems
- Whether the kernels can handle instance weights
- Correct functioning
- Correct initialisation during buildKernel (i.e. no result changes when buildKernel called repeatedly)
- Whether the kernels alters the data passed to it (number of instances, instance order, instance weights, etc)
- Degenerate cases
- building kernels with zero training instances
- all but one predictor attribute values missing
- all predictor attribute values missing
- all but one class values missing
- all class values missing
Running CheckKernel with the debug option set will output the training and test datasets for any failed tests.
The weka.classifiers.AbstractKernelTest
uses this class to test all the kernels. Any changes here, have to be checked in that abstract test class, too.
Valid options are:
-D Turn on debugging output.
-S Silent mode - prints nothing to stdout.
-N The number of instances in the datasets (default 20).
-nominal The number of nominal attributes (default 2).
-nominal-values The number of values for nominal attributes (default 1).
-numeric The number of numeric attributes (default 1).
-string The number of string attributes (default 1).
-date The number of date attributes (default 1).
-relational The number of relational attributes (default 1).
-num-instances-relational The number of instances in relational/bag attributes (default 10).
-words The words to use in string attributes.
-word-separators The word separators to use in string attributes.
-W Full name of the kernel analysed. eg: weka.classifiers.functions.supportVector.RBFKernel (default weka.classifiers.functions.supportVector.RBFKernel)
Options specific to kernel weka.classifiers.functions.supportVector.RBFKernel:
-D Enables debugging output (if available) to be printed. (default: off)
-no-checks Turns off all checks - use with caution! (default: checks on)
-C The size of the cache (a prime number), 0 for full cache and -1 to turn it off. (default: 250007)
-G The Gamma parameter. (default: 0.01)
Options after -- are passed to the designated kernel.