predict - Predict responses for Gaussian kernel regression model - MATLAB (original) (raw)

Predict responses for Gaussian kernel regression model

Syntax

Description

[YFit](#d126e1290877) = predict([Mdl](#d126e1290658),[X](#mw%5F64962b9c-8bb5-4ba8-9648-9851f8a2c806)) returns a vector of predicted responses for the predictor data in the matrix or table X, based on the binary Gaussian kernel regression modelMdl.

example

[YFit](#d126e1290877) = predict([Mdl](#d126e1290658),[X](#mw%5F64962b9c-8bb5-4ba8-9648-9851f8a2c806),PredictionForMissingValue=[prediction](#mw%5F69ef01f6-4f5b-4cc8-a337-7405c201ec6a%5Fsep%5Fmw%5F0a5f3ed1-774b-425e-a442-92f5a43fb279)) uses the prediction value as the predicted response for observations with missing values in the predictor data X. By default, predict uses the median of the observed response values in the training data. (since R2023b)

Examples

collapse all

Predict the test set responses using a Gaussian kernel regression model for the carbig data set.

Load the carbig data set.

Specify the predictor variables (X) and the response variable (Y).

X = [Weight,Cylinders,Horsepower,Model_Year]; Y = MPG;

Delete rows of X and Y where either array has NaN values. Removing rows with NaN values before passing data to fitrkernel can speed up training and reduce memory usage.

R = rmmissing([X Y]); X = R(:,1:4); Y = R(:,end);

Reserve 10% of the observations as a holdout sample. Extract the training and test indices from the partition definition.

rng(10) % For reproducibility N = length(Y); cvp = cvpartition(N,'Holdout',0.1); idxTrn = training(cvp); % Training set indices idxTest = test(cvp); % Test set indices

Train the regression kernel model. Standardize the training data.

Xtrain = X(idxTrn,:); Ytrain = Y(idxTrn); Mdl = fitrkernel(Xtrain,Ytrain,'Standardize',true)

Mdl = RegressionKernel ResponseName: 'Y' Learner: 'svm' NumExpansionDimensions: 128 KernelScale: 1 Lambda: 0.0028 BoxConstraint: 1 Epsilon: 0.8617

Properties, Methods

Mdl is a RegressionKernel model.

Predict responses for the test set.

Xtest = X(idxTest,:); Ytest = Y(idxTest);

YFit = predict(Mdl,Xtest);

Create a table containing the first 10 observed response values and predicted response values.

table(Ytest(1:10),YFit(1:10),'VariableNames', ... {'ObservedValue','PredictedValue'})

ans=10×2 table ObservedValue PredictedValue _____________ ______________

     18              17.616    
     14              25.799    
     24              24.141    
     25              25.018    
     14              13.637    
     14              14.557    
     18              18.584    
     27              26.096    
     21              25.031    
     13              13.324    

Estimate the test set regression loss using the mean squared error loss function.

L = loss(Mdl,Xtest,Ytest)

Input Arguments

collapse all

Kernel regression model, specified as a RegressionKernel model object. You can create aRegressionKernel model object using fitrkernel.

Predictor data used to generate responses, specified as a numeric matrix or table.

Each row of X corresponds to one observation, and each column corresponds to one variable.

Data Types: double | single | table

Since R2023b

Predicted response value to use for observations with missing predictor values, specified as "median", "mean", or a numeric scalar.

Value Description
"median" predict uses the median of the observed response values in the training data as the predicted response value for observations with missing predictor values.
"mean" predict uses the mean of the observed response values in the training data as the predicted response value for observations with missing predictor values.
Numeric scalar predict uses this value as the predicted response value for observations with missing predictor values.

Example: "mean"

Example: NaN

Data Types: single | double | char | string

Output Arguments

collapse all

Predicted responses, returned as a numeric vector.

YFit is an_n_-by-1 vector of the same data type as the response data (Y) used to trainMdl, where n is the number of observations in X.

Extended Capabilities

expand all

Thepredict function supports tall arrays with the following usage notes and limitations:

For more information, see Tall Arrays.

Usage notes and limitations:

For more information, see Introduction to Code Generation.

Version History

Introduced in R2018a

expand all

predict fully supports GPU arrays.

Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.

This table lists the object functions that support thePredictionForMissingValue name-value argument. By default, the functions use the training set median as the predicted response value for observations with missing predictor values.

Model Type Model Objects Object Functions
Gaussian process regression (GPR) model RegressionGP, CompactRegressionGP loss, predict, resubLoss, resubPredict
RegressionPartitionedGP kfoldLoss, kfoldPredict
Gaussian kernel regression model RegressionKernel loss, predict
RegressionPartitionedKernel kfoldLoss, kfoldPredict
Linear regression model RegressionLinear loss, predict
RegressionPartitionedLinear kfoldLoss, kfoldPredict
Neural network regression model RegressionNeuralNetwork, CompactRegressionNeuralNetwork loss, predict, resubLoss, resubPredict
RegressionPartitionedNeuralNetwork kfoldLoss, kfoldPredict
Support vector machine (SVM) regression model RegressionSVM, CompactRegressionSVM loss, predict, resubLoss, resubPredict
RegressionPartitionedSVM kfoldLoss, kfoldPredict

In previous releases, the regression model loss and predict functions listed above used NaN predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.

You can generate C/C++ code for the predict function.