gcorani - JNCC2 Credal Classifier 2 (original) (raw)

Statistical classification

JNCC2 addresses the problems of statistical classification. In general, a classifier learns from data the relationship that holds between a set of attributes (also called features) characterizing a given object, and the class the object belongs to. For instance, e-mail filtering is a classification problem: the classifier analyzes the frequency of some keywords contained in the message, to eventually decide whether the message is an ordinary e-mail or spam. Automated reading of postal codes, handwritten characters recognition and speech recognition constitute further examples of classification problems. For further information, visit the wikipedia page about statistical classification.

Naive Credal Classifier 2

JNCC2 is the Java implementation of the Naive Credal Classifier 2 (NCC2 - Corani and Zaffalon, 2008). NCC2 constitutes an extension of the traditional Naive Bayes Classifier (NBC) towards imprecise probabilities; it is designed to return robust classification, even on small and/or _incomplete_data sets. A peculiar feature of NCC2 is that it returns set-valued (or imprecise) classifications (i.e., more than one class) when faced with doubtful instances.

Extensive empirical investigation shows that NCC2 returns imprecise judgments on instances whose classification is in fact very doubtful; in fact, NBC achieves a much higher classification accuracy on the instances precisely classified by NCC2, than on those imprecisely classified by NCC2.

Requirements

As JNCC2 is developed in Java, it runs under any operating system. To run JNCC2, it is necessary to have installed the Java Runtime Environment, release 5.0 or above, which can be downloaded from the the Sun Download Center.

JNCC2 runs from the command-line, and requires only little memory to run.

Download

JNCC2 is open source; it isreleased under the terms of the GNU GPL license.

The latest release of JNCC2 is 1.11 (October 2008). The zip file available for download contains:

>> download jncc2, version 1.11

CHANGELOG
Version 1.11 provides the following improvements over version 1.1:

OLD RELEASES

All users are encouraged to use the latest release. However, the previous releases are still available from this archive.

Bibliography

Naive Credal Classifier 2:

G.Corani, M. Zaffalon “ Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2 ”, Journal of Machine Learning Research, 9, 581--621, 2008.>> download

Earlier works: Naive Credal Classifier

NCC2 has later extended the former NCC, by incorporating a much more flexible and powerful methodology to deal with missing data. Earlier works about the Naive Credal Classifier (NCC), authored by Marco Zaffalon, include:

Data sets

JNCC2 loads data from ARFF files. The ARFF format (Attribute-Relation File format) is a textual format designed for classification problem. It has been originally developed for WEKA, an open source software that implements a wide collection of data mining algorithms.

Since WEKAhas become a standard tool for data analysis, large repositories of ARFF data sets have been set up. The WEKA page of data sets is a good starting point for browsing through ARFF repositories.

Beware that JNCC2 will be able to work only on _classification_data sets, and not on regression data sets.