kfoldPredict - Predict responses for observations in cross-validated regression model - MATLAB (original) (raw)
Predict responses for observations in cross-validated regression model
Syntax
Description
[yFit](#mw%5F07f1a040-132a-48ca-b863-13a56a7414dc) = kfoldPredict([CVMdl](#bsu1qp2-1%5Fsep%5Fmw%5F6577e588-b931-4d13-a896-01d344a3d171))
returns responses predicted by the cross-validated regression modelCVMdl
. For every fold, kfoldPredict
predicts the responses for validation-fold observations using a model trained on training-fold observations. CVMdl.X
and CVMdl.Y
contain both sets of observations.
[yFit](#mw%5F07f1a040-132a-48ca-b863-13a56a7414dc) = kfoldPredict([CVMdl](#bsu1qp2-1%5Fsep%5Fmw%5F6577e588-b931-4d13-a896-01d344a3d171),[Name,Value](#namevaluepairarguments))
specifies options using one or more name-value arguments. For example,'IncludeInteractions',true
specifies to include interaction terms in computations for generalized additive models.
[[yFit](#mw%5F07f1a040-132a-48ca-b863-13a56a7414dc),[ySD](#mw%5Fdeeb7733-3174-4093-911b-ccbe236226a0),[yInt](#mw%5Fbc65533c-94fb-4ded-8344-a8d179390cdf)] = kfoldPredict(___)
also returns the standard deviations and prediction intervals of the response variable, evaluated at each observation in the predictor data CVMdl.X
, using any of the input argument combinations in the previous syntaxes. This syntax applies only to generalized additive models (GAM) for which the IsStandardDeviationFit property of CVMdl istrue
.
Examples
When you create a cross-validated regression model, you can compute the mean squared error (MSE) by using the kfoldLoss
object function. Alternatively, you can predict responses for validation-fold observations using kfoldPredict
and compute the MSE manually.
Load the carsmall
data set. Specify the predictor data X
and the response data Y
.
load carsmall X = [Cylinders Displacement Horsepower Weight]; Y = MPG;
Train a cross-validated regression tree model. By default, the software implements 10-fold cross-validation.
rng('default') % For reproducibility CVMdl = fitrtree(X,Y,'CrossVal','on');
Compute the 10-fold cross-validation MSE by using kfoldLoss
.
Predict the responses yfit
by using the cross-validated regression model. Compute the mean squared error between yfit
and the true responses CVMdl.Y
. The computed MSE matches the loss value returned by kfoldLoss
.
yfit = kfoldPredict(CVMdl); mse = mean((yfit - CVMdl.Y).^2)
Input Arguments
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: 'Alpha',0.01,'IncludeInteractions',false
specifies the confidence level as 99% and excludes interaction terms from computations for a generalized additive model.
Significance level for the confidence level of the prediction intervalsyInt, specified as a numeric scalar in the range[0,1]
. The confidence level of yInt
is equal to 100(1 – Alpha)%
.
This argument is valid only for a generalized additive model object that includes the standard deviation fit. That is, you can specify this argument only whenCVMdl is RegressionPartitionedGAM and the IsStandardDeviationFit property of CVMdl
istrue
.
Example: 'Alpha',0.01
Data Types: single
| double
Flag to include interaction terms of the model, specified as true
orfalse
. This argument is valid only for a generalized additive model (GAM). That is, you can specify this argument only whenCVMdl is RegressionPartitionedGAM.
The default value is true
if the models inCVMdl
(CVMdl.Trained
) contain interaction terms. The value must be false
if the models do not contain interaction terms.
Data Types: logical
Since R2023b
Predicted response value to use for observations with missing predictor values, specified as "median"
, "mean"
, or a numeric scalar. This argument is valid only for a Gaussian process regression, neural network, or support vector machine model. That is, you can specify this argument only whenCVMdl is a RegressionPartitionedGP
,RegressionPartitionedNeuralNetwork
, orRegressionPartitionedSVM
object.
Value | Description |
---|---|
"median" | kfoldPredict uses the median of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.This value is the default when CVMdl is aRegressionPartitionedGP,RegressionPartitionedNeuralNetwork, orRegressionPartitionedSVM object. |
"mean" | kfoldPredict uses the mean of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values. |
Numeric scalar | kfoldPredict uses this value as the predicted response value for observations with missing predictor values. |
Example: "PredictionForMissingValue","mean"
Example: "PredictionForMissingValue",NaN
Data Types: single
| double
| char
| string
Output Arguments
Predicted responses, returned as an n_-by-1 numeric vector, where_n is the number of observations. (n issize(CVMdl.X,1)
when observations are in rows.) Each entry ofyFit
corresponds to the predicted response for the corresponding row of CVMdl.X
.
If you use a holdout validation technique to create CVMdl (that is, if CVMdl.KFold
is 1
), thenyFit
has NaN
values for training-fold observations.
Standard deviations of the response variable, evaluated at each observation in the predictor data [CVMdl](#bsu1qp2-1%5Fsep%5Fmw%5F6577e588-b931-4d13-a896-01d344a3d171).X
, returned as a column vector of length n, where n is the number of observations in `CVMdl`.X
. Thei
th element ySD(i)
contains the standard deviation of the i
th response for the i
th observation CVMdl.X(i,:)
, estimated using the trained standard deviation model in CVMdl
.
This argument is valid only for a generalized additive model object that includes the standard deviation fit. That is, kfoldPredict
can return this argument only when CVMdl
is RegressionPartitionedGAM and the IsStandardDeviationFit property of CVMdl
istrue
.
Prediction intervals of the response variable, evaluated at each observation in the predictor data [CVMdl](#bsu1qp2-1%5Fsep%5Fmw%5F6577e588-b931-4d13-a896-01d344a3d171).X
, returned as an_n_-by-2 matrix, where n is the number of observations in `CVMdl`.X
. Thei
th row yInt(i,:)
contains the estimated100(1 – [Alpha](#mw%5F2e2bf216-bf31-400d-b9b2-07aa80264beb))%
prediction interval of the i
th response for the i
th observation CVMdl.X(i,:)
using[ySD](#mw%5Fdeeb7733-3174-4093-911b-ccbe236226a0)(i)
. The Alpha
value is the probability that the prediction interval does not contain the true response valueCVMdl.Y(i)
. The first column of yInt
contains the lower limits of the prediction intervals, and the second column contains the upper limits.
This argument is valid only for a generalized additive model object that includes the standard deviation fit. That is, kfoldPredict
can return this argument only when CVMdl
is RegressionPartitionedGAM and the IsStandardDeviationFit property of CVMdl
istrue
.
Extended Capabilities
Usage notes and limitations:
- This function fully supports GPU arrays for the following models.
- RegressionPartitionedEnsemble
- RegressionPartitionedNeuralNetwork
- RegressionPartitionedModel object fitted using fitrtree, or by passing aRegressionTree object tocrossval
- RegressionPartitionedSVM
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2011a
Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue
name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.
This table lists the object functions that support thePredictionForMissingValue
name-value argument. By default, the functions use the training set median as the predicted response value for observations with missing predictor values.
Model Type | Model Objects | Object Functions |
---|---|---|
Gaussian process regression (GPR) model | RegressionGP, CompactRegressionGP | loss, predict, resubLoss, resubPredict |
RegressionPartitionedGP | kfoldLoss, kfoldPredict | |
Gaussian kernel regression model | RegressionKernel | loss, predict |
RegressionPartitionedKernel | kfoldLoss, kfoldPredict | |
Linear regression model | RegressionLinear | loss, predict |
RegressionPartitionedLinear | kfoldLoss, kfoldPredict | |
Neural network regression model | RegressionNeuralNetwork, CompactRegressionNeuralNetwork | loss, predict, resubLoss, resubPredict |
RegressionPartitionedNeuralNetwork | kfoldLoss, kfoldPredict | |
Support vector machine (SVM) regression model | RegressionSVM, CompactRegressionSVM | loss, predict, resubLoss, resubPredict |
RegressionPartitionedSVM | kfoldLoss, kfoldPredict |
In previous releases, the regression model loss
and predict
functions listed above used NaN
predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.