ClassificationNaiveBayes - Naive Bayes classification for multiclass classification - MATLAB (original) (raw)
Naive Bayes classification for multiclass classification
Description
ClassificationNaiveBayes
is a Naive Bayes classifier for multiclass learning. Trained ClassificationNaiveBayes
classifiers store the training data, parameter values, data distribution, and prior probabilities. Use these classifiers to perform tasks such as estimating resubstitution predictions (see resubPredict) and predicting labels or posterior probabilities for new data (see predict).
Creation
Create a ClassificationNaiveBayes
object by using fitcnb.
Properties
Predictor Properties
This property is read-only.
Predictor names, specified as a cell array of character vectors. The order of the elements in PredictorNames
corresponds to the order in which the predictor names appear in the training data X
.
This property is read-only.
Expanded predictor names, specified as a cell array of character vectors.
If the model uses dummy variable encoding for categorical variables, thenExpandedPredictorNames
includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames
is the same asPredictorNames
.
This property is read-only.
Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors
contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p
, where p
is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]
).
Data Types: single
| double
This property is read-only.
This property is read-only.
Unstandardized predictors used to train the naive Bayes classifier, specified as a numeric matrix. Each row of X
corresponds to one observation, and each column corresponds to one variable. The software excludes observations containing at least one missing value, and removes corresponding elements from Y.
Predictor Distribution Properties
This property is read-only.
Data Types: char
| string
| cell
This property is read-only.
This property is read-only.
Data Types: char
| string
| cell
Since R2023b
This property is read-only.
Predictor means, specified as a numeric vector. If you specifyStandardize
as 1
or true
when you train the naive Bayes classifier using fitcnb
, then the length of the Mu
vector is equal to the number of predictors. The vector contains 0
values for predictors with nonkernel distributions, such as categorical predictors (see DistributionNames
).
If you set Standardize
to 0
orfalse
when you train the naive Bayes classifier usingfitcnb
, then the Mu
value is an empty vector ([]
).
Data Types: double
Since R2023b
This property is read-only.
Predictor standard deviations, specified as a numeric vector. If you specifyStandardize
as 1
or true
when you train the naive Bayes classifier using fitcnb
, then the length of the Sigma
vector is equal to the number of predictors. The vector contains 1
values for predictors with nonkernel distributions, such as categorical predictors (seeDistributionNames
).
If you set Standardize
to 0
orfalse
when you train the naive Bayes classifier usingfitcnb
, then the Sigma
value is an empty vector ([]
).
Data Types: double
This property is read-only.
This property is read-only.
Response Properties
This property is read-only.
Data Types: categorical
| char
| string
| logical
| double
| cell
This property is read-only.
Response variable name, specified as a character vector.
Data Types: char
| string
This property is read-only.
Class labels used to train the naive Bayes classifier, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each row of Y
represents the observed classification of the corresponding row of X
.
Y
has the same data type as the data in Y
used for training the model. (The software treats string arrays as cell arrays of character vectors.)
Data Types: single
| double
| logical
| char
| string
| cell
| categorical
Training Properties
This property is read-only.
Parameter values used to train the ClassificationNaiveBayes
model, specified as an object. ModelParameters
contains parameter values such as the name-value pair argument values used to train the naive Bayes classifier.
Access the properties of ModelParameters
by using dot notation. For example, access the kernel support using Mdl.ModelParameters.Support
.
This property is read-only.
Number of training observations in the training data stored in X
and Y
, specified as a numeric scalar.
Data Types: double
| single
This property is read-only.
Observation weights, specified as a vector of nonnegative values with the same number of rows as Y
. Each entry in W
specifies the relative importance of the corresponding observation in Y
. fitcnb normalizes the value you set for the 'Weights'
name-value pair argument, so that the weights within a particular class sum to the prior probability for that class.
Classifier Properties
Data Types: double
| single
This property is read-only.
Data Types: char
| string
| function handle
Object Functions
compact | Reduce size of machine learning model |
---|---|
compareHoldout | Compare accuracies of two classification models using new data |
crossval | Cross-validate machine learning model |
edge | Classification edge for naive Bayes classifier |
incrementalLearner | Convert naive Bayes classification model to incremental learner |
lime | Local interpretable model-agnostic explanations (LIME) |
logp | Log unconditional probability density for naive Bayes classifier |
loss | Classification loss for naive Bayes classifier |
margin | Classification margins for naive Bayes classifier |
partialDependence | Compute partial dependence |
plotPartialDependence | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |
predict | Classify observations using naive Bayes classifier |
resubEdge | Resubstitution classification edge |
resubLoss | Resubstitution classification loss |
resubMargin | Resubstitution classification margin |
resubPredict | Classify training data using trained classifier |
shapley | Shapley values |
testckfold | Compare accuracies of two classification models by repeated cross-validation |
Examples
Create a naive Bayes classifier for Fisher's iris data set. Then, specify prior probabilities after training the classifier.
Load the fisheriris
data set. Create X
as a numeric matrix that contains four measurements for 150 irises. Create Y
as a cell array of character vectors that contains the corresponding iris species.
load fisheriris X = meas; Y = species;
Train a naive Bayes classifier using the predictors X
and class labels Y
. fitcnb
assumes each predictor is independent and fits each predictor using a normal distribution by default.
Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3×4 cell}
Properties, Methods
Mdl
is a trained ClassificationNaiveBayes
classifier. Some of the Mdl
properties appear in the Command Window.
Display the properties of Mdl
using dot notation. For example, display the class names and prior probabilities.
ans = 3×1 cell {'setosa' } {'versicolor'} {'virginica' }
ans = 1×3
0.3333 0.3333 0.3333
The order of the class prior probabilities in Mdl.Prior
corresponds to the order of the classes in Mdl.ClassNames
. By default, the prior probabilities are the respective relative frequencies of the classes in the data. Alternatively, you can set the prior probabilities when calling fitcnb
by using the 'Prior'
name-value pair argument.
Set the prior probabilities after training the classifier by using dot notation. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively.
Mdl.Prior = [0.5 0.2 0.3];
You can now use this trained classifier to perform additional tasks. For example, you can label new measurements using predict
or cross-validate the classifier using crossval
.
Train and cross-validate a naive Bayes classifier. fitcnb
implements 10-fold cross-validation by default. Then, estimate the cross-validated classification error.
Load the ionosphere
data set. Remove the first two predictors for stability.
load ionosphere X = X(:,3:end); rng('default') % for reproducibility
Train and cross-validate a naive Bayes classifier using the predictors X
and class labels Y
. A recommended practice is to specify the class names. fitcnb
assumes that each predictor is conditionally and normally distributed.
CVMdl = fitcnb(X,Y,'ClassNames',{'b','g'},'CrossVal','on')
CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {'x1' 'x2' 'x3' 'x4' 'x5' 'x6' 'x7' 'x8' 'x9' 'x10' 'x11' 'x12' 'x13' 'x14' 'x15' 'x16' 'x17' 'x18' 'x19' 'x20' 'x21' 'x22' 'x23' 'x24' 'x25' 'x26' 'x27' 'x28' 'x29' 'x30' 'x31' 'x32'} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1×1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none'
Properties, Methods
CVMdl
is a ClassificationPartitionedModel
cross-validated, naive Bayes classifier. Alternatively, you can cross-validate a trained ClassificationNaiveBayes
model by passing it to crossval.
Display the first training fold of CVMdl
using dot notation.
ans = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1×32 cell} DistributionParameters: {2×32 cell}
Properties, Methods
Each fold is a CompactClassificationNaiveBayes
model trained on 90% of the data.
Full and compact naive Bayes models are not used for predicting on new data. Instead, use them to estimate the generalization error by passing CVMdl
to kfoldLoss
.
genError = kfoldLoss(CVMdl)
On average, the generalization error is approximately 19%.
You can specify a different conditional distribution for the predictors, or tune the conditional distribution parameters to reduce the generalization error.
More About
In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in the observation. The number of categories (bins) in the multinomial model is the number of distinct tokens (number of predictors).
Naive Bayes is a classification algorithm that applies density estimation to the data.
The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Although the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1].
Naive Bayes classifiers assign observations to the most probable class (in other words, the_maximum a posteriori_ decision rule). Explicitly, the algorithm takes these steps:
- Estimate the densities of the predictors within each class.
- Model posterior probabilities according to Bayes rule. That is, for all_k_ = 1,...,K,
where:- Y is the random variable corresponding to the class index of an observation.
- _X_1,...,XP are the random predictors of an observation.
- π(Y=k) is the prior probability that a class index is_k_.
- Classify an observation by estimating the posterior probability for each class, and then assign the observation to the class yielding the maximum posterior probability.
If the predictors compose a multinomial distribution, then the posterior probabilityP^(Y=k|X1,..,XP)∝π(Y=k)Pmn(X1,...,XP|Y=k), where Pmn(X1,...,XP|Y=k) is the probability mass function of a multinomial distribution.
Algorithms
If predictor variable j
has a conditional normal distribution (see the DistributionNames
property), the software fits the distribution to the data by computing the class-specific weighted mean and the unbiased estimate of the weighted standard deviation. For each class k:
- The weighted mean of predictor j is
where wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class. - The unbiased estimator of the weighted standard deviation of predictor j is
where _z_1|k is the sum of the weights within class k and _z_2|k is the sum of the squared weights within class k.
If all predictor variables compose a conditional multinomial distribution (see theDistributionNames
property), the software fits the distribution using the Bag-of-Tokens Model. The software stores the probability that token j
appears in class k
in the propertyDistributionParameters{_`k`_,_`j`_}
. With additive smoothing [2], the estimated probability is
where:
- cj|k=nk∑{i:yi=k}xijwi∑{i:yi=k}wi, which is the weighted number of occurrences of token_j_ in class k.
- nk is the number of observations in class k.
- wi is the weight for observation i. The software normalizes weights within a class so that they sum to the prior probability for that class.
- ck=∑j=1Pcj|k, which is the total weighted number of occurrences of all tokens in class k.
If predictor variable j
has a conditional multivariate multinomial distribution (see the DistributionNames
property), the software follows this procedure:
- The software collects a list of the unique levels, stores the sorted list in
CategoricalLevels
, and considers each level a bin. Each combination of predictor and class is a separate, independent multinomial random variable. - For each class k, the software counts instances of each categorical level using the list stored in
CategoricalLevels{_`j`_}
. - The software stores the probability that predictor_
j
_ in classk
has level_L_ in the propertyDistributionParameters{_`k`_,_`j`_}
, for all levels inCategoricalLevels{_`j`_}
. With additive smoothing [2], the estimated probability is
where:- mj|k(L)=nk∑{i:yi=k}I{xij=L}wi∑{i:yi=k}wi, which is the weighted number of observations for which predictor j equals L in class k.
- nk is the number of observations in class k.
- I{xij=L}=1 if xij =L, and 0 otherwise.
- wi is the weight for observation_i_. The software normalizes weights within a class so that they sum to the prior probability for that class.
- mj is the number of distinct levels in predictor j.
- mk is the weighted number of observations in class k.
References
[1] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer, 2009. https://doi.org/10.1007/978-0-387-84858-7\.
[2] Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.
Extended Capabilities
Usage notes and limitations:
- The predict function supports code generation.
- When you train a naive Bayes model by using fitcnb, the following restrictions apply.
- The value of the 'DistributionNames' name-value pair argument cannot contain
'mn'
. - The value of the 'ScoreTransform' name-value pair argument cannot be an anonymous function.
- The value of the 'DistributionNames' name-value pair argument cannot contain
For more information, see Introduction to Code Generation.
Version History
Introduced in R2014b
fitcnb
supports the standardization of predictors with kernel distributions. That is, you can specify the Standardize
name-value argument as true
when the DistributionNames
name-value argument includes at least one "kernel"
distribution. Naive Bayes models include Mu
and Sigma
properties that contain the means and standard deviations, respectively, used to standardize the predictors before training. The properties are empty when fitcnb
does not perform any standardization.