ClassificationNaiveBayes - Naive Bayes classification for multiclass classification - MATLAB (original) (raw)

Naive Bayes classification for multiclass classification

Description

ClassificationNaiveBayes is a Naive Bayes classifier for multiclass learning. Trained ClassificationNaiveBayes classifiers store the training data, parameter values, data distribution, and prior probabilities. Use these classifiers to perform tasks such as estimating resubstitution predictions (see resubPredict) and predicting labels or posterior probabilities for new data (see predict).

Creation

Create a ClassificationNaiveBayes object by using fitcnb.

Properties

expand all

Predictor Properties

This property is read-only.

Predictor names, specified as a cell array of character vectors. The order of the elements in PredictorNames corresponds to the order in which the predictor names appear in the training data X.

This property is read-only.

Expanded predictor names, specified as a cell array of character vectors.

If the model uses dummy variable encoding for categorical variables, thenExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same asPredictorNames.

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

This property is read-only.

Unstandardized predictors used to train the naive Bayes classifier, specified as a numeric matrix. Each row of X corresponds to one observation, and each column corresponds to one variable. The software excludes observations containing at least one missing value, and removes corresponding elements from Y.

Predictor Distribution Properties

This property is read-only.

Data Types: char | string | cell

This property is read-only.

Data Types: char | string | cell

Since R2023b

This property is read-only.

Predictor means, specified as a numeric vector. If you specifyStandardize as 1 or true when you train the naive Bayes classifier using fitcnb, then the length of the Mu vector is equal to the number of predictors. The vector contains 0 values for predictors with nonkernel distributions, such as categorical predictors (see DistributionNames).

If you set Standardize to 0 orfalse when you train the naive Bayes classifier usingfitcnb, then the Mu value is an empty vector ([]).

Data Types: double

Since R2023b

This property is read-only.

Predictor standard deviations, specified as a numeric vector. If you specifyStandardize as 1 or true when you train the naive Bayes classifier using fitcnb, then the length of the Sigma vector is equal to the number of predictors. The vector contains 1 values for predictors with nonkernel distributions, such as categorical predictors (seeDistributionNames).

If you set Standardize to 0 orfalse when you train the naive Bayes classifier usingfitcnb, then the Sigma value is an empty vector ([]).

Data Types: double

This property is read-only.

Response Properties

This property is read-only.

This property is read-only.

Response variable name, specified as a character vector.

Data Types: char | string

This property is read-only.

Class labels used to train the naive Bayes classifier, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each row of Y represents the observed classification of the corresponding row of X.

Y has the same data type as the data in Y used for training the model. (The software treats string arrays as cell arrays of character vectors.)

Training Properties

This property is read-only.

Parameter values used to train the ClassificationNaiveBayes model, specified as an object. ModelParameters contains parameter values such as the name-value pair argument values used to train the naive Bayes classifier.

Access the properties of ModelParameters by using dot notation. For example, access the kernel support using Mdl.ModelParameters.Support.

This property is read-only.

Number of training observations in the training data stored in X and Y, specified as a numeric scalar.

Data Types: double | single

This property is read-only.

Observation weights, specified as a vector of nonnegative values with the same number of rows as Y. Each entry in W specifies the relative importance of the corresponding observation in Y. fitcnb normalizes the value you set for the 'Weights' name-value pair argument, so that the weights within a particular class sum to the prior probability for that class.

Classifier Properties

Data Types: double | single

This property is read-only.

Data Types: char | string | function handle

Object Functions

compact	Reduce size of machine learning model
compareHoldout	Compare accuracies of two classification models using new data
crossval	Cross-validate machine learning model
edge	Classification edge for naive Bayes classifier
incrementalLearner	Convert naive Bayes classification model to incremental learner
lime	Local interpretable model-agnostic explanations (LIME)
logp	Log unconditional probability density for naive Bayes classifier
loss	Classification loss for naive Bayes classifier
margin	Classification margins for naive Bayes classifier
partialDependence	Compute partial dependence
plotPartialDependence	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predict	Classify observations using naive Bayes classifier
resubEdge	Resubstitution classification edge
resubLoss	Resubstitution classification loss
resubMargin	Resubstitution classification margin
resubPredict	Classify training data using trained classifier
shapley	Shapley values
testckfold	Compare accuracies of two classification models by repeated cross-validation

Examples

collapse all

Create a naive Bayes classifier for Fisher's iris data set. Then, specify prior probabilities after training the classifier.

Load the fisheriris data set. Create X as a numeric matrix that contains four measurements for 150 irises. Create Y as a cell array of character vectors that contains the corresponding iris species.

load fisheriris X = meas; Y = species;

Train a naive Bayes classifier using the predictors X and class labels Y. fitcnb assumes each predictor is independent and fits each predictor using a normal distribution by default.

Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3×4 cell}

Properties, Methods

Mdl is a trained ClassificationNaiveBayes classifier. Some of the Mdl properties appear in the Command Window.

Display the properties of Mdl using dot notation. For example, display the class names and prior probabilities.

ans = 3×1 cell {'setosa' } {'versicolor'} {'virginica' }

ans = 1×3

0.3333    0.3333    0.3333

The order of the class prior probabilities in Mdl.Prior corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data. Alternatively, you can set the prior probabilities when calling fitcnb by using the 'Prior' name-value pair argument.

Set the prior probabilities after training the classifier by using dot notation. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively.

Mdl.Prior = [0.5 0.2 0.3];

You can now use this trained classifier to perform additional tasks. For example, you can label new measurements using predict or cross-validate the classifier using crossval.

Train and cross-validate a naive Bayes classifier. fitcnb implements 10-fold cross-validation by default. Then, estimate the cross-validated classification error.

Load the ionosphere data set. Remove the first two predictors for stability.

load ionosphere X = X(:,3:end); rng('default') % for reproducibility

Train and cross-validate a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed.

CVMdl = fitcnb(X,Y,'ClassNames',{'b','g'},'CrossVal','on')

CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {'x1' 'x2' 'x3' 'x4' 'x5' 'x6' 'x7' 'x8' 'x9' 'x10' 'x11' 'x12' 'x13' 'x14' 'x15' 'x16' 'x17' 'x18' 'x19' 'x20' 'x21' 'x22' 'x23' 'x24' 'x25' 'x26' 'x27' 'x28' 'x29' 'x30' 'x31' 'x32'} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1×1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'

Properties, Methods

CVMdl is a ClassificationPartitionedModel cross-validated, naive Bayes classifier. Alternatively, you can cross-validate a trained ClassificationNaiveBayes model by passing it to crossval.

Display the first training fold of CVMdl using dot notation.

ans = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1×32 cell} DistributionParameters: {2×32 cell}

Properties, Methods

Each fold is a CompactClassificationNaiveBayes model trained on 90% of the data.

Full and compact naive Bayes models are not used for predicting on new data. Instead, use them to estimate the generalization error by passing CVMdl to kfoldLoss.

genError = kfoldLoss(CVMdl)

On average, the generalization error is approximately 19%.

You can specify a different conditional distribution for the predictors, or tune the conditional distribution parameters to reduce the generalization error.

More About

expand all

In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in the observation. The number of categories (bins) in the multinomial model is the number of distinct tokens (number of predictors).

Naive Bayes is a classification algorithm that applies density estimation to the data.

The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Although the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1].

Naive Bayes classifiers assign observations to the most probable class (in other words, the_maximum a posteriori_ decision rule). Explicitly, the algorithm takes these steps:

Estimate the densities of the predictors within each class.
Model posterior probabilities according to Bayes rule. That is, for all_k_ = 1,...,K,
where:
- Y is the random variable corresponding to the class index of an observation.
- _X_1,...,XP are the random predictors of an observation.
- π(Y=k) is the prior probability that a class index is_k_.
Classify an observation by estimating the posterior probability for each class, and then assign the observation to the class yielding the maximum posterior probability.

If the predictors compose a multinomial distribution, then the posterior probabilityP^(Y=k|X1,..,XP)∝π(Y=k)Pmn(X1,...,XP|Y=k), where Pmn(X1,...,XP|Y=k) is the probability mass function of a multinomial distribution.

Algorithms

expand all

If predictor variable j has a conditional normal distribution (see the DistributionNames property), the software fits the distribution to the data by computing the class-specific weighted mean and the unbiased estimate of the weighted standard deviation. For each class k:

The weighted mean of predictor j is
where wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class.
The unbiased estimator of the weighted standard deviation of predictor j is
where _z_1|k is the sum of the weights within class k and _z_2|k is the sum of the squared weights within class k.

If all predictor variables compose a conditional multinomial distribution (see theDistributionNames property), the software fits the distribution using the Bag-of-Tokens Model. The software stores the probability that token j appears in class k in the propertyDistributionParameters{_`k`_,_`j`_}. With additive smoothing [2], the estimated probability is

where:

cj|k=nk∑{i:yi=k}xijwi∑{i:yi=k}wi, which is the weighted number of occurrences of token_j_ in class k.
nk is the number of observations in class k.
wi is the weight for observation i. The software normalizes weights within a class so that they sum to the prior probability for that class.
ck=∑j=1Pcj|k, which is the total weighted number of occurrences of all tokens in class k.

If predictor variable j has a conditional multivariate multinomial distribution (see the DistributionNames property), the software follows this procedure:

The software collects a list of the unique levels, stores the sorted list inCategoricalLevels, and considers each level a bin. Each combination of predictor and class is a separate, independent multinomial random variable.
For each class k, the software counts instances of each categorical level using the list stored inCategoricalLevels{_`j`_}.
The software stores the probability that predictor_j_ in class k has level_L_ in the propertyDistributionParameters{_`k`_,_`j`_}, for all levels inCategoricalLevels{_`j`_}. With additive smoothing [2], the estimated probability is
where:
- mj|k(L)=nk∑{i:yi=k}I{xij=L}wi∑{i:yi=k}wi, which is the weighted number of observations for which predictor j equals L in class k.
- nk is the number of observations in class k.
- I{xij=L}=1 if xij =L, and 0 otherwise.
- wi is the weight for observation_i_. The software normalizes weights within a class so that they sum to the prior probability for that class.
- mj is the number of distinct levels in predictor j.
- mk is the weighted number of observations in class k.

References

[1] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer, 2009. https://doi.org/10.1007/978-0-387-84858-7\.

[2] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.

Extended Capabilities

expand all

Usage notes and limitations:

The predict function supports code generation.
When you train a naive Bayes model by using fitcnb, the following restrictions apply.
- The value of the 'DistributionNames' name-value pair argument cannot contain'mn'.
- The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function.

For more information, see Introduction to Code Generation.

Version History

Introduced in R2014b

expand all

fitcnb supports the standardization of predictors with kernel distributions. That is, you can specify the Standardize name-value argument as true when the DistributionNames name-value argument includes at least one "kernel" distribution. Naive Bayes models include Mu and Sigma properties that contain the means and standard deviations, respectively, used to standardize the predictors before training. The properties are empty when fitcnb does not perform any standardization.