RegressionPartitionedKernel - Cross-validated kernel model for regression - MATLAB (original) (raw)
Cross-validated kernel model for regression
Description
RegressionPartitionedKernel
is a set of kernel regression models trained on cross-validated folds. To obtain a cross-validated, kernel regression model, use fitrkernel and specify one of the cross-validation options. You can estimate the predictive quality of the model, or how well the linear regression model generalizes, using one or more of these “kfold” methods: kfoldPredict andkfoldLoss.
Every “kfold” method uses models trained on_training-fold_ observations to predict the response for_validation-fold_ observations. For example, suppose that you cross-validate using five folds. In this case, the software randomly assigns each observation into five groups of equal size (roughly). The training fold contains four of the groups (that is, roughly 4/5 of the data) and the validation fold contains the other group (that is, roughly 1/5 of the data). In this case, cross-validation proceeds as follows:
- The software trains the first model (stored in
CVMdl.Trained{1}
) using the observations in the last four groups and reserves the observations in the first group for validation. - The software trains the second model (stored in
CVMdl.Trained{2}
) using the observations in the first group and the last three groups. The software reserves the observations in the second group for validation. - The software proceeds in a similar fashion for the third through the fifth models.
If you validate by calling kfoldPredict, it computes predictions for the observations in group 1 using the first model, group 2 for the second model, and so on. In short, the software estimates a response for every observation using the model trained without that observation.
Note
RegressionPartitionedKernel
model objects do not store the predictor data set.
Creation
Create a RegressionPartitionedKernel
object using the fitrkernel function. Use one of the 'CrossVal'
, 'CVPartition'
,'Holdout'
, 'KFold'
, or'Leaveout'
name-value pair arguments in the call tofitrkernel
. For details, see the fitrkernel
function reference page.
Properties
Cross-Validation Properties
This property is read-only.
Cross-validated model name, specified as a character vector.
For example, 'Kernel'
specifies a cross-validated kernel model.
Data Types: char
This property is read-only.
Number of cross-validated folds, specified as a positive integer scalar.
Data Types: double
This property is read-only.
Cross-validation parameter values, specified as an object. The parameter values correspond to the name-value pair argument values used to cross-validate the kernel regression model. ModelParameters
does not contain estimated parameters.
This property is read-only.
Number of observations in the training data, specified as a positive numeric scalar.
Data Types: double
This property is read-only.
Data partition indicating how the software splits the data into cross-validation folds, specified as a cvpartition model.
This property is read-only.
Kernel regression models trained on cross-validation folds, specified as a cell array of RegressionKernel
models. Trained
has_k_ cells, where k is the number of folds.
Data Types: cell
This property is read-only.
Observation weights used to cross-validate the model, specified as a numeric vector. W
has NumObservations
elements.
The software normalizes the weights used for training so thatsum(W,'omitnan')
is 1
.
Data Types: single
| double
This property is read-only.
Observed response values used to cross-validate the model, specified as a numeric vector. Y
has NumObservations
elements.
Each row of Y
represents the observed response of the corresponding observation in the predictor data.
Data Types: single
| double
Other Regression Properties
This property is read-only.
Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors
contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p
, where p
is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]
).
Data Types: single
| double
This property is read-only.
Predictor names in order of their appearance in the predictor data, specified as a cell array of character vectors. The length of PredictorNames
is equal to the number of columns used as predictor variables in the training dataX
or Tbl
.
Data Types: cell
This property is read-only.
Response variable name, specified as a character vector.
Data Types: char
Response transformation function, specified as 'none'
or a function handle. ResponseTransform
describes how the software transforms raw response values predicted by the model.
For a MATLAB® function, or a function that you define, enter its function handle. For example, you can enter Mdl.ResponseTransform = @_function_
, where_function_
accepts a numeric vector of the original responses and returns a numeric vector of the same size containing the transformed responses.
Data Types: char
| string
| function_handle
Object Functions
gather | Gather properties of Statistics and Machine Learning Toolbox object from GPU |
---|---|
kfoldLoss | Regression loss for cross-validated kernel regression model |
kfoldPredict | Predict responses for observations in cross-validated kernel regression model |
Examples
Simulate sample data.
rng(0,'twister'); % For reproducibility n = 1000; x = linspace(-10,10,n)'; y = 1 + x2e-2 + sin(x)./x + 0.2randn(n,1);
Cross-validate a kernel regression model.
CVMdl = fitrkernel(x,y,'Kfold',5);
CVMdl
is a RegressionPartitionedKernel
5-fold cross-validated model. CVMdl.Trained
contains a cell vector of five RegressionKernel
models. Display the trained property.
ans=5×1 cell array {1×1 RegressionKernel} {1×1 RegressionKernel} {1×1 RegressionKernel} {1×1 RegressionKernel} {1×1 RegressionKernel}
Each cell contains a kernel regression model trained on four folds, then tested on the remaining fold.
Predict responses for observations in the validation folds and estimate the generalization error by passing CVMdl
to kfoldPredict
and kfoldLoss
, respectively.
yHat = kfoldPredict(CVMdl); L = kfoldLoss(CVMdl)
kfoldLoss
computes the average mean squared error for all the folds by default. The estimated mean squared error is 0.1887.
Extended Capabilities
The object functions of a RegressionPartitionedKernel
model fully support GPU arrays.
Version History
Introduced in R2018b
You can fit a RegressionPartitionedKernel
object with GPU arrays by using fitrkernel. TheRegressionPartitionedKernel
object functions support GPU array input arguments so that the functions can execute on a GPU.
You can also gather the properties of a RegressionPartitionedKernel
model object from the GPU by using the gather function.