kfoldPredict - Predict responses for observations in cross-validated linear regression

        model - MATLAB ([original](http://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedlinear.kfoldpredict.html)) ([raw](?raw))

Predict responses for observations in cross-validated linear regression model

Syntax

Description

[YHat](#bu6uld2-1-YHat) = kfoldPredict([CVMdl](#bu6uld2-1%5Fsep%5Fshared-CVMdl)) returns cross-validated predicted responses by the cross-validated linear regression model CVMdl. For every fold, kfoldPredict predicts the responses for validation-fold observations using a model trained on training-fold observations.

YHat contains predicted responses for each regularization strength in the linear regression models that compose CVMdl.

example

[YHat](#bu6uld2-1-YHat) = kfoldPredict([CVMdl](#bu6uld2-1%5Fsep%5Fshared-CVMdl),PredictionForMissingValue=[prediction](#mw%5F164501f1-cd69-46ed-b3c4-0532d6f135ec)) uses the prediction value as the predicted response for observations with missing values in the predictor data. By default,kfoldPredict uses the median of the observed response values in the training-fold data. (since R2023b)

Examples

collapse all

Simulate 10000 observations from this model

y=x100+2x200+e.

rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2X(:,200) + 0.3randn(n,1);

Cross-validate a linear regression model.

CVMdl = fitrlinear(X,Y,'CrossVal','on')

CVMdl = RegressionPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 10000 KFold: 10 Partition: [1×1 cvpartition] ResponseTransform: 'none'

Properties, Methods

Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000×1 double] Bias: 0.0107 Lambda: 1.1111e-04 Learner: 'svm'

Properties, Methods

By default, fitrlinear implements 10-fold cross-validation. CVMdl is a RegressionPartitionedLinear model. It contains the property Trained, which is a 10-by-1 cell array holding 10 RegressionLinear models that the software trained using the training set.

Predict responses for observations that fitrlinear did not use in training the folds.

yHat = kfoldPredict(CVMdl);

Because there is one regularization strength in Mdl, yHat is a numeric vector.

Simulate 10000 observations as in Predict Cross-Validated Responses.

rng(1) % For reproducibility n = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2X(:,200) + 0.3randn(n,1);

Create a set of 15 logarithmically-spaced regularization strengths from 10-5 through 10-1.

Lambda = logspace(-5,-1,15);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Specify using least squares with a lasso penalty and optimizing the objective function using SpaRSA.

X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,... 'Learner','leastsquares','Solver','sparsa','Regularization','lasso');

CVMdl is a RegressionPartitionedLinear model. Its Trained property contains a 5-by-1 cell array of trained RegressionLinear models, each one holds out a different fold during training. Because fitrlinear trained using 15 regularization strengths, you can think of each RegressionLinear model as 15 models.

Predict cross-validated responses.

YHat = kfoldPredict(CVMdl); size(YHat)

ans = 1×15

-1.7338 -1.7332 -1.7319 -1.7299 -1.7266 -1.7239 -1.7135 -1.7210 -1.7324 -1.7063 -1.6397 -1.5112 -1.2631 -0.7841 -0.0096

YHat is a 10000-by-15 matrix. YHat(2,:) is the cross-validated response for observation 2 using the model regularized with all 15 regularization values.

Input Arguments

collapse all

Since R2023b

Predicted response value to use for observations with missing predictor values, specified as "median", "mean", or a numeric scalar.

Value Description
"median" kfoldPredict uses the median of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
"mean" kfoldPredict uses the mean of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
Numeric scalar kfoldPredict uses this value as the predicted response value for observations with missing predictor values.

Example: "mean"

Example: NaN

Data Types: single | double | char | string

Output Arguments

collapse all

Cross-validated predicted responses, returned as an_n_-by-L numeric array.n is the number of observations in the predictor data that created CVMdl (see X) and_L_ is the number of regularization strengths inCVMdl.Trained{1}.Lambda.YHat(_`i`_,_`j`_) is the predicted response for observation i using the linear regression model that has regularization strengthCVMdl.Trained{1}.Lambda(_`j`_).

The predicted response using the model with regularization strength j is y^j=xβj+bj.

Extended Capabilities

Version History

Introduced in R2016a

expand all

kfoldPredict fully supports GPU arrays.

Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.

This table lists the object functions that support thePredictionForMissingValue name-value argument. By default, the functions use the training set median as the predicted response value for observations with missing predictor values.

Model Type Model Objects Object Functions
Gaussian process regression (GPR) model RegressionGP, CompactRegressionGP loss, predict, resubLoss, resubPredict
RegressionPartitionedGP kfoldLoss, kfoldPredict
Gaussian kernel regression model RegressionKernel loss, predict
RegressionPartitionedKernel kfoldLoss, kfoldPredict
Linear regression model RegressionLinear loss, predict
RegressionPartitionedLinear kfoldLoss, kfoldPredict
Neural network regression model RegressionNeuralNetwork, CompactRegressionNeuralNetwork loss, predict, resubLoss, resubPredict
RegressionPartitionedNeuralNetwork kfoldLoss, kfoldPredict
Support vector machine (SVM) regression model RegressionSVM, CompactRegressionSVM loss, predict, resubLoss, resubPredict
RegressionPartitionedSVM kfoldLoss, kfoldPredict

In previous releases, the regression model loss and predict functions listed above used NaN predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.