predict - Classify observations using naive Bayes classifier - MATLAB (original) (raw)

Classify observations using naive Bayes classifier

Syntax

Description

[label](#budv6du-1-label) = predict([Mdl](#budvxvy-1%5Fsep%5Fshared-Mdl),[X](#budvxvy-1-X)) returns a vector of predicted class labels for the predictor data in the table or matrix X, based on the trained naive Bayes classification modelMdl. The trained naive Bayes model can either be full or compact.

example

[[label](#budv6du-1-label),[Posterior](#mw%5F4035643c-21a8-464a-a094-a33d73203ea4),[Cost](#budv6du-1-Cost)] = predict([Mdl](#budvxvy-1%5Fsep%5Fshared-Mdl),[X](#budvxvy-1-X)) also returns thePosterior Probability (Posterior) and predicted (expected) Misclassification Cost (Cost) corresponding to the observations (rows) in Mdl.X. For each observation inX, the predicted class label corresponds to the minimum expected classification cost among all classes.

example

Examples

collapse all

Load the fisheriris data set. Create X as a numeric matrix that contains four measurements for 150 irises. Create Y as a cell array of character vectors that contains the corresponding iris species.

load fisheriris X = meas; Y = species; rng('default') % for reproducibility

Randomly partition observations into a training set and a test set with stratification, using the class information in Y. Specify a 30% holdout sample for testing.

cv = cvpartition(Y,'HoldOut',0.30);

Extract the training and test indices.

trainInds = training(cv); testInds = test(cv);

Specify the training and test data sets.

XTrain = X(trainInds,:); YTrain = Y(trainInds); XTest = X(testInds,:); YTest = Y(testInds);

Train a naive Bayes classifier using the predictors XTrain and class labels YTrain. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed.

Mdl = fitcnb(XTrain,YTrain,'ClassNames',{'setosa','versicolor','virginica'})

Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 105 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3×4 cell}

Properties, Methods

Mdl is a trained ClassificationNaiveBayes classifier.

Predict the test sample labels.

idx = randsample(sum(testInds),10); label = predict(Mdl,XTest);

Display the results for a random set of 10 observations in the test sample.

table(YTest(idx),label(idx),'VariableNames',... {'TrueLabel','PredictedLabel'})

ans=10×2 table TrueLabel PredictedLabel ______________ ______________

{'virginica' }    {'virginica' }
{'versicolor'}    {'versicolor'}
{'versicolor'}    {'versicolor'}
{'virginica' }    {'virginica' }
{'setosa'    }    {'setosa'    }
{'virginica' }    {'virginica' }
{'setosa'    }    {'setosa'    }
{'versicolor'}    {'versicolor'}
{'versicolor'}    {'virginica' }
{'versicolor'}    {'versicolor'}

Create a confusion chart from the true labels YTest and the predicted labels label.

cm = confusionchart(YTest,label);

Figure contains an object of type ConfusionMatrixChart.

Estimate posterior probabilities and misclassification costs for new observations using a naive Bayes classifier. Classify new observations using a memory-efficient pretrained classifier.

Load the fisheriris data set. Create X as a numeric matrix that contains four measurements for 150 irises. Create Y as a cell array of character vectors that contains the corresponding iris species.

load fisheriris X = meas; Y = species; rng('default') % for reproducibility

Partition the data set into two sets: one contains the training set, and the other contains new, unobserved data. Reserve 10 observations for the new data set.

n = size(X,1); newInds = randsample(n,10); inds = ~ismember(1:n,newInds); XNew = X(newInds,:); YNew = Y(newInds);

Train a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed.

Mdl = fitcnb(X(inds,:),Y(inds),... 'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a trained ClassificationNaiveBayes classifier.

Conserve memory by reducing the size of the trained naive Bayes classifier.

CMdl = compact(Mdl); whos('Mdl','CMdl')

Name Size Bytes Class Attributes

CMdl 1x1 5534 classreg.learning.classif.CompactClassificationNaiveBayes
Mdl 1x1 12907 ClassificationNaiveBayes

CMdl is a CompactClassificationNaiveBayes classifier. It uses less memory than Mdl because Mdl stores the data.

Display the class names of CMdl using dot notation.

ans = 3×1 cell {'setosa' } {'versicolor'} {'virginica' }

Predict the labels. Estimate the posterior probabilities and expected class misclassification costs.

[labels,PostProbs,MisClassCost] = predict(CMdl,XNew);

Compare the true labels with the predicted labels.

table(YNew,labels,PostProbs,MisClassCost,'VariableNames',... {'TrueLabels','PredictedLabels',... 'PosteriorProbabilities','MisclassificationCosts'})

ans=10×4 table TrueLabels PredictedLabels PosteriorProbabilities MisclassificationCosts
______________ _______________ _________________________________________ ______________________________________

{'virginica' }    {'virginica' }     4.0832e-268     4.6422e-09              1             1             1    4.6422e-09
{'setosa'    }    {'setosa'    }               1     3.0706e-18     4.6719e-25    3.0706e-18             1             1
{'virginica' }    {'virginica' }     1.0007e-246     5.8758e-10              1             1             1    5.8758e-10
{'versicolor'}    {'versicolor'}      1.2022e-61        0.99995     4.9859e-05             1    4.9859e-05       0.99995
{'virginica' }    {'virginica' }      2.687e-226     1.7905e-08              1             1             1    1.7905e-08
{'versicolor'}    {'versicolor'}      3.3431e-76        0.99971     0.00028983             1    0.00028983       0.99971
{'virginica' }    {'virginica' }       4.05e-166      0.0028527        0.99715             1       0.99715     0.0028527
{'setosa'    }    {'setosa'    }               1     1.1272e-14     2.0308e-23    1.1272e-14             1             1
{'virginica' }    {'virginica' }     1.3292e-228     8.3604e-10              1             1             1    8.3604e-10
{'setosa'    }    {'setosa'    }               1     4.5023e-17     2.1724e-24    4.5023e-17             1             1

PostProbs and MisClassCost are 10-by-3 numeric matrices, where each row corresponds to a new observation and each column corresponds to a class. The order of the columns corresponds to the order of CMdl.ClassNames.

Load the fisheriris data set. Create X as a numeric matrix that contains petal length and width measurements for 150 irises. Create Y as a cell array of character vectors that contains the corresponding iris species.

load fisheriris X = meas(:,3:4); Y = species;

Train a naive Bayes classifier using the predictors X and class labels Y. A recommended practice is to specify the class names. fitcnb assumes that each predictor is conditionally and normally distributed.

Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'});

Mdl is a trained ClassificationNaiveBayes classifier.

Define a grid of values in the observed predictor space.

xMax = max(X); xMin = min(X); h = 0.01; [x1Grid,x2Grid] = meshgrid(xMin(1):h:xMax(1),xMin(2):h:xMax(2));

Predict the posterior probabilities for each instance in the grid.

[~,PosteriorRegion] = predict(Mdl,[x1Grid(:),x2Grid(:)]);

Plot the posterior probability regions and the training data.

h = scatter(x1Grid(:),x2Grid(:),1,PosteriorRegion); h.MarkerEdgeAlpha = 0.3;

Plot the data.

hold on gh = gscatter(X(:,1),X(:,2),Y,'k','dx*'); title 'Iris Petal Measurements and Posterior Probabilities'; xlabel 'Petal length (cm)'; ylabel 'Petal width (cm)'; axis tight legend(gh,'Location','Best') hold off

Figure contains an axes object. The axes object with title Iris Petal Measurements and Posterior Probabilities, xlabel Petal length (cm), ylabel Petal width (cm) contains 4 objects of type scatter, line. One or more of the lines displays its values using only markers These objects represent setosa, versicolor, virginica.

Input Arguments

collapse all

Predictor data to be classified, specified as a numeric matrix or table.

Each row of X corresponds to one observation, and each column corresponds to one variable.

Data Types: table | double | single

Notes:

Output Arguments

collapse all

Predicted class labels, returned as a categorical vector, character array, logical or numeric vector, or cell array of character vectors.

The predicted class labels have the following:

Class Posterior Probability, returned as a numeric matrix.Posterior has rows equal to the number of rows ofMdl.X and columns equal to the number of distinct classes in the training data (size(Mdl.ClassNames,1)).

Posterior(j,k) is the predicted posterior probability of classk (in class Mdl.ClassNames(k)) given the observation in row j of Mdl.X.

Expected Misclassification Cost, returned as a numeric matrix.Cost has rows equal to the number of rows ofMdl.X and columns equal to the number of distinct classes in the training data (size(Mdl.ClassNames,1)).

Cost(j,k) is the expected misclassification cost of the observation in rowj of Mdl.X predicted into classk (in class Mdl.ClassNames(k)).

More About

collapse all

A misclassification cost is the relative severity of a classifier labeling an observation into the wrong class.

Two types of misclassification cost exist: true and expected. Let K be the number of classes.

The posterior probability is the probability that an observation belongs in a particular class, given the data.

For naive Bayes, the posterior probability that a classification is k for a given observation (_x_1,...,xP) is

where:

The prior probability of a class is the assumed relative frequency with which observations from that class occur in a population.

Alternative Functionality

To integrate the prediction of a naive Bayes classification model into Simulink®, you can use the ClassificationNaiveBayes Predict block in the Statistics and Machine Learning Toolbox™ library or a MATLAB® Function block with the predict function. For examples, see Predict Class Labels Using ClassificationNaiveBayes Predict Block and Predict Class Labels Using MATLAB Function Block.

When deciding which approach to use, consider the following:

Extended Capabilities

expand all

This function fully supports tall arrays. You can use models trained on either in-memory or tall data with this function.

For more information, see Tall Arrays.

Usage notes and limitations:

For more information, see Introduction to Code Generation.

Version History

Introduced in R2014b