RegressionKernel - Gaussian kernel regression model using random feature expansion - MATLAB (original) (raw)

Gaussian kernel regression model using random feature expansion

Description

RegressionKernel is a trained model object for Gaussian kernel regression using random feature expansion. RegressionKernel is more practical for big data applications that have large training sets but can also be applied to smaller data sets that fit in memory.

Unlike other regression models, and for economical memory usage,RegressionKernel model objects do not store the training data. However, they do store information such as the dimension of the expanded space, the kernel scale parameter, and the regularization strength.

You can use trained RegressionKernel models to continue training using the training data, predict responses for new data, and compute the mean squared error or epsilon-insensitive loss. For details, see resume,predict, and loss.

Creation

Create a RegressionKernel object using the fitrkernel function. This function maps data in a low-dimensional space into a high-dimensional space, then fits a linear model in the high-dimensional space by minimizing the regularized objective function. Obtaining the linear model in the high-dimensional space is equivalent to applying the Gaussian kernel to the model in the low-dimensional space. Available linear regression models include regularized support vector machines (SVM) and least-squares regression models.

Properties

expand all

Kernel Regression Properties

Half the width of the epsilon-insensitive band, specified as a nonnegative scalar.

If Learner is not 'svm', thenEpsilon is an empty array ([]).

Data Types: single | double

Linear regression model type, specified as'leastsquares' or'svm'.

In the following table, f(x)=T(x)β+b.

x is an observation (row vector) from p predictor variables.
T(·) is a transformation of an observation (row vector) for feature expansion. T(x) maps x in ℝp to a high-dimensional space (ℝm).
β is a vector of coefficients.
b is the scalar bias.

Value	Algorithm	Loss Function	FittedLoss Value
'svm'	Support vector machine regression	Epsilon insensitive: ℓ[y,f(x)]=max[0,\|y−f(x)	−ε]
'leastsquares'	Linear regression through ordinary least squares	Mean squared error (MSE): ℓ[y,f(x)]=12[y−f(x)]2	'mse'

Number of dimensions of the expanded space, specified as a positive integer.

Data Types: single | double

Kernel scale parameter, specified as a positive scalar.

Data Types: single | double

Box constraint, specified as a positive scalar.

Data Types: double | single

Regularization term strength, specified as a nonnegative scalar.

Data Types: single | double

Since R2023b

This property is read-only.

Predictor means, specified as a numeric vector. If you specify Standardize as 1 or true when you train the kernel model, then the length of the Mu vector is equal to the number of expanded predictors (see ExpandedPredictorNames). The vector contains 0 values for dummy variables corresponding to expanded categorical predictors.

If you set Standardize to 0 or false when you train the kernel model, then the Mu value is an empty vector ([]).

Data Types: double

Since R2023b

This property is read-only.

Predictor standard deviations, specified as a numeric vector. If you specify Standardize as 1 or true when you train the kernel model, then the length of the Sigma vector is equal to the number of expanded predictors (see ExpandedPredictorNames). The vector contains 1 values for dummy variables corresponding to expanded categorical predictors.

If you set Standardize to 0 or false when you train the kernel model, then the Sigma value is an empty vector ([]).

Data Types: double

Loss function used to fit the linear model, specified as'epsiloninsensitive' or'mse'.

Value	Algorithm	Loss Function	Learner Value
'epsiloninsensitive'	Support vector machine regression	Epsilon insensitive: ℓ[y,f(x)]=max[0,\|y−f(x)	−ε]
'mse'	Linear regression through ordinary least squares	Mean squared error (MSE): ℓ[y,f(x)]=12[y−f(x)]2	'leastsquares'

Other Regression Properties

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

Parameters used for training the RegressionKernel model, specified as a structure.

Access fields of ModelParameters using dot notation. For example, access the relative tolerance on the linear coefficients and the bias term by usingMdl.ModelParameters.BetaTolerance.

Data Types: struct

Predictor names in order of their appearance in the predictor data, specified as a cell array of character vectors. The length ofPredictorNames is equal to the number of columns used as predictor variables in the training dataX or Tbl.

Data Types: cell

Response variable name, specified as a character vector.

Data Types: char

Response transformation function to apply to predicted responses, specified as 'none' or a function handle.

For kernel regression models and before the response transformation, the predicted response for the observation x (row vector) is f(x)=T(x)β+b.

T(·) is a transformation of an observation for feature expansion.
β corresponds toMdl.Beta.
b corresponds toMdl.Bias.

For a MATLAB® function or a function that you define, enter its function handle. For example, you can enter Mdl.ResponseTransform = @_function_, where_function_ accepts a numeric vector of the original responses and returns a numeric vector of the same size containing the transformed responses.

Data Types: char | function_handle

Object Functions

gather	Gather properties of Statistics and Machine Learning Toolbox object from GPU
incrementalLearner	Convert kernel regression model to incremental learner
lime	Local interpretable model-agnostic explanations (LIME)
loss	Regression loss for Gaussian kernel regression model
partialDependence	Compute partial dependence
plotPartialDependence	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predict	Predict responses for Gaussian kernel regression model
resume	Resume training of Gaussian kernel regression model
shapley	Shapley values

Examples

collapse all

Train a kernel regression model for a tall array by using SVM.

When you perform calculations on tall arrays, MATLAB® uses either a parallel pool (default if you have Parallel Computing Toolbox™) or the local MATLAB session. To run the example using the local MATLAB session when you have Parallel Computing Toolbox, change the global execution environment by using the mapreducer function.

Create a datastore that references the folder location with the data. The data can be contained in a single file, a collection of files, or an entire folder. Treat 'NA' values as missing data so that datastore replaces them with NaN values. Select a subset of the variables to use. Create a tall table on top of the datastore.

varnames = {'ArrTime','DepTime','ActualElapsedTime'}; ds = datastore('airlinesmall.csv','TreatAsMissing','NA',... 'SelectedVariableNames',varnames); t = tall(ds);

Specify DepTime and ArrTime as the predictor variables (X) and ActualElapsedTime as the response variable (Y). Select the observations for which ArrTime is later than DepTime.

daytime = t.ArrTime>t.DepTime; Y = t.ActualElapsedTime(daytime); % Response data X = t{daytime,{'DepTime' 'ArrTime'}}; % Predictor data

Standardize the predictor variables.

Z = zscore(X); % Standardize the data

Train a default Gaussian kernel regression model with the standardized predictors. Extract a fit summary to determine how well the optimization algorithm fits the model to the data.

[Mdl,FitInfo] = fitrkernel(Z,Y)

Found 6 chunks. |========================================================================= | Solver | Iteration / | Objective | Gradient | Beta relative | | | Data Pass | | magnitude | change | |========================================================================= | INIT | 0 / 1 | 4.307833e+01 | 9.925486e-02 | NaN | | LBFGS | 0 / 2 | 2.782790e+01 | 7.202403e-03 | 9.891473e-01 | | LBFGS | 1 / 3 | 2.781351e+01 | 1.806211e-02 | 3.220672e-03 | | LBFGS | 2 / 4 | 2.777773e+01 | 2.727737e-02 | 9.309939e-03 | | LBFGS | 3 / 5 | 2.768591e+01 | 2.951422e-02 | 2.833343e-02 | | LBFGS | 4 / 6 | 2.755857e+01 | 5.124144e-02 | 7.935278e-02 | | LBFGS | 5 / 7 | 2.738896e+01 | 3.089571e-02 | 4.644920e-02 | | LBFGS | 6 / 8 | 2.716704e+01 | 2.552696e-02 | 8.596406e-02 | | LBFGS | 7 / 9 | 2.696409e+01 | 3.088621e-02 | 1.263589e-01 | | LBFGS | 8 / 10 | 2.676203e+01 | 2.021303e-02 | 1.533927e-01 | | LBFGS | 9 / 11 | 2.660322e+01 | 1.221361e-02 | 1.351968e-01 | | LBFGS | 10 / 12 | 2.645504e+01 | 1.486501e-02 | 1.175476e-01 | | LBFGS | 11 / 13 | 2.631323e+01 | 1.772835e-02 | 1.161909e-01 | | LBFGS | 12 / 14 | 2.625264e+01 | 5.837906e-02 | 1.422851e-01 | | LBFGS | 13 / 15 | 2.619281e+01 | 1.294441e-02 | 2.966283e-02 | | LBFGS | 14 / 16 | 2.618220e+01 | 3.791806e-03 | 9.051274e-03 | | LBFGS | 15 / 17 | 2.617989e+01 | 3.689255e-03 | 6.364132e-03 | | LBFGS | 16 / 18 | 2.617426e+01 | 4.200232e-03 | 1.213026e-02 | | LBFGS | 17 / 19 | 2.615914e+01 | 7.339928e-03 | 2.803348e-02 | | LBFGS | 18 / 20 | 2.620704e+01 | 2.298098e-02 | 1.749830e-01 | |========================================================================= | Solver | Iteration / | Objective | Gradient | Beta relative | | | Data Pass | | magnitude | change | |========================================================================= | LBFGS | 18 / 21 | 2.615554e+01 | 1.164689e-02 | 8.580878e-02 | | LBFGS | 19 / 22 | 2.614367e+01 | 3.395507e-03 | 3.938314e-02 | | LBFGS | 20 / 23 | 2.614090e+01 | 2.349246e-03 | 1.495049e-02 | |========================================================================|

Mdl = RegressionKernel ResponseName: 'Y' Learner: 'svm' NumExpansionDimensions: 64 KernelScale: 1 Lambda: 8.5385e-06 BoxConstraint: 1 Epsilon: 5.9303

Properties, Methods

FitInfo = struct with fields: Solver: 'LBFGS-tall' LossFunction: 'epsiloninsensitive' Lambda: 8.5385e-06 BetaTolerance: 1.0000e-03 GradientTolerance: 1.0000e-05 ObjectiveValue: 26.1409 GradientMagnitude: 0.0023 RelativeChangeInBeta: 0.0150 FitTime: 13.1051 History: [1×1 struct]

Mdl is a RegressionKernel model. To inspect the regression error, you can pass Mdl and the training data or new data to the loss function. Or, you can pass Mdl and new predictor data to the predict function to predict responses for new observations. You can also pass Mdl and the training data to the resume function to continue training.

FitInfo is a structure array containing optimization information. Use FitInfo to determine whether optimization termination measurements are satisfactory.

For improved accuracy, you can increase the maximum number of optimization iterations ('IterationLimit') and decrease the tolerance values ('BetaTolerance' and 'GradientTolerance') by using the name-value pair arguments of fitrkernel. Doing so can improve measures like ObjectiveValue and RelativeChangeInBeta in FitInfo. You can also optimize model parameters by using the 'OptimizeHyperparameters' name-value pair argument.

Resume training a Gaussian kernel regression model for more iterations to improve the regression loss.

Load the carbig data set.

Specify the predictor variables (X) and the response variable (Y).

X = [Acceleration,Cylinders,Displacement,Horsepower,Weight]; Y = MPG;

Delete rows of X and Y where either array has NaN values. Removing rows with NaN values before passing data to fitrkernel can speed up training and reduce memory usage.

R = rmmissing([X Y]); % Data with missing entries removed X = R(:,1:5); Y = R(:,end);

Reserve 10% of the observations as a holdout sample. Extract the training and test indices from the partition definition.

rng(10) % For reproducibility N = length(Y); cvp = cvpartition(N,'Holdout',0.1); idxTrn = training(cvp); % Training set indices idxTest = test(cvp); % Test set indices

Train a kernel regression model. Standardize the training data, set the iteration limit to 5, and specify 'Verbose',1 to display diagnostic information.

Xtrain = X(idxTrn,:); Ytrain = Y(idxTrn);

Mdl = fitrkernel(Xtrain,Ytrain,'Standardize',true, ... 'IterationLimit',5,'Verbose',1)

|=================================================================================================================| | Solver | Pass | Iteration | Objective | Step | Gradient | Relative | sum(beta~=0) | | | | | | | magnitude | change in Beta | | |=================================================================================================================| | LBFGS | 1 | 0 | 5.691016e+00 | 0.000000e+00 | 5.852758e-02 | | 0 | | LBFGS | 1 | 1 | 5.086537e+00 | 8.000000e+00 | 5.220869e-02 | 9.846711e-02 | 256 | | LBFGS | 1 | 2 | 3.862301e+00 | 5.000000e-01 | 3.796034e-01 | 5.998808e-01 | 256 | | LBFGS | 1 | 3 | 3.460613e+00 | 1.000000e+00 | 3.257790e-01 | 1.615091e-01 | 256 | | LBFGS | 1 | 4 | 3.136228e+00 | 1.000000e+00 | 2.832861e-02 | 8.006254e-02 | 256 | | LBFGS | 1 | 5 | 3.063978e+00 | 1.000000e+00 | 1.475038e-02 | 3.314455e-02 | 256 | |=================================================================================================================|

Mdl = RegressionKernel ResponseName: 'Y' Learner: 'svm' NumExpansionDimensions: 256 KernelScale: 1 Lambda: 0.0028 BoxConstraint: 1 Epsilon: 0.8617

Properties, Methods

Mdl is a RegressionKernel model.

Estimate the epsilon-insensitive error for the test set.

Xtest = X(idxTest,:); Ytest = Y(idxTest);

L = loss(Mdl,Xtest,Ytest,'LossFun','epsiloninsensitive')

Continue training the model by using resume. This function continues training with the same options used for training Mdl.

UpdatedMdl = resume(Mdl,Xtrain,Ytrain);

|================== | Solver | Pass | | | | |================== | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | |================== | Solver | Pass | | | | |================== | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | |================== | Solver | Pass | | | | |================== | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | | LBFGS | 1 | |================== ===============================================================================================| Iteration | Objective | Step | Gradient | Relative | sum(beta=0) | | | | magnitude | change in Beta | | ===============================================================================================| 0 | 3.063978e+00 | 0.000000e+00 | 1.475038e-02 | | 256 | 1 | 3.007822e+00 | 8.000000e+00 | 1.391637e-02 | 2.603966e-02 | 256 | 2 | 2.817171e+00 | 5.000000e-01 | 5.949008e-02 | 1.918084e-01 | 256 | 3 | 2.807294e+00 | 2.500000e-01 | 6.798867e-02 | 2.973097e-02 | 256 | 4 | 2.791060e+00 | 1.000000e+00 | 2.549575e-02 | 1.639328e-02 | 256 | 5 | 2.767821e+00 | 1.000000e+00 | 6.154419e-03 | 2.468903e-02 | 256 | 6 | 2.738163e+00 | 1.000000e+00 | 5.949008e-02 | 9.476263e-02 | 256 | 7 | 2.719146e+00 | 1.000000e+00 | 1.699717e-02 | 1.849972e-02 | 256 | 8 | 2.705941e+00 | 1.000000e+00 | 3.116147e-02 | 4.152590e-02 | 256 | 9 | 2.701162e+00 | 1.000000e+00 | 5.665722e-03 | 9.401466e-03 | 256 | 10 | 2.695341e+00 | 5.000000e-01 | 3.116147e-02 | 4.968046e-02 | 256 | 11 | 2.691277e+00 | 1.000000e+00 | 8.498584e-03 | 1.017446e-02 | 256 | 12 | 2.689972e+00 | 1.000000e+00 | 1.983003e-02 | 9.938921e-03 | 256 | 13 | 2.688979e+00 | 1.000000e+00 | 1.416431e-02 | 6.606316e-03 | 256 | 14 | 2.687787e+00 | 1.000000e+00 | 1.621956e-03 | 7.089542e-03 | 256 | 15 | 2.686539e+00 | 1.000000e+00 | 1.699717e-02 | 1.169701e-02 | 256 | 16 | 2.685356e+00 | 1.000000e+00 | 1.133144e-02 | 1.069310e-02 | 256 | 17 | 2.685021e+00 | 5.000000e-01 | 1.133144e-02 | 2.104248e-02 | 256 | 18 | 2.684002e+00 | 1.000000e+00 | 2.832861e-03 | 6.175231e-03 | 256 | 19 | 2.683507e+00 | 1.000000e+00 | 5.665722e-03 | 3.724026e-03 | 256 | 20 | 2.683343e+00 | 5.000000e-01 | 5.665722e-03 | 9.549119e-03 | 256 | ===============================================================================================| Iteration | Objective | Step | Gradient | Relative | sum(beta=0) | | | | magnitude | change in Beta | | ===============================================================================================| 21 | 2.682897e+00 | 1.000000e+00 | 5.665722e-03 | 7.172867e-03 | 256 | 22 | 2.682682e+00 | 1.000000e+00 | 2.832861e-03 | 2.587726e-03 | 256 | 23 | 2.682485e+00 | 1.000000e+00 | 2.832861e-03 | 2.953648e-03 | 256 | 24 | 2.682326e+00 | 1.000000e+00 | 2.832861e-03 | 7.777294e-03 | 256 | 25 | 2.681914e+00 | 1.000000e+00 | 2.832861e-03 | 2.778555e-03 | 256 | 26 | 2.681867e+00 | 5.000000e-01 | 1.031085e-03 | 3.638352e-03 | 256 | 27 | 2.681725e+00 | 1.000000e+00 | 5.665722e-03 | 1.515199e-03 | 256 | 28 | 2.681692e+00 | 5.000000e-01 | 1.314940e-03 | 1.850055e-03 | 256 | 29 | 2.681625e+00 | 1.000000e+00 | 2.832861e-03 | 1.456903e-03 | 256 | 30 | 2.681594e+00 | 5.000000e-01 | 2.832861e-03 | 8.704875e-04 | 256 | 31 | 2.681581e+00 | 5.000000e-01 | 8.498584e-03 | 3.934768e-04 | 256 | 32 | 2.681579e+00 | 1.000000e+00 | 8.498584e-03 | 1.847866e-03 | 256 | 33 | 2.681553e+00 | 1.000000e+00 | 9.857038e-04 | 6.509825e-04 | 256 | 34 | 2.681541e+00 | 5.000000e-01 | 8.498584e-03 | 6.635528e-04 | 256 | 35 | 2.681499e+00 | 1.000000e+00 | 5.665722e-03 | 6.194735e-04 | 256 | 36 | 2.681493e+00 | 5.000000e-01 | 1.133144e-02 | 1.617763e-03 | 256 | 37 | 2.681473e+00 | 1.000000e+00 | 9.869233e-04 | 8.418484e-04 | 256 | 38 | 2.681469e+00 | 1.000000e+00 | 5.665722e-03 | 1.069722e-03 | 256 | 39 | 2.681432e+00 | 1.000000e+00 | 2.832861e-03 | 8.501930e-04 | 256 | 40 | 2.681423e+00 | 2.500000e-01 | 1.133144e-02 | 9.543716e-04 | 256 | ===============================================================================================| Iteration | Objective | Step | Gradient | Relative | sum(beta~=0) | | | | magnitude | change in Beta | | ===============================================================================================| 41 | 2.681416e+00 | 1.000000e+00 | 2.832861e-03 | 8.763251e-04 | 256 | 42 | 2.681413e+00 | 5.000000e-01 | 2.832861e-03 | 4.101888e-04 | 256 | 43 | 2.681403e+00 | 1.000000e+00 | 5.665722e-03 | 2.713209e-04 | 256 | 44 | 2.681392e+00 | 1.000000e+00 | 2.832861e-03 | 2.115241e-04 | 256 | 45 | 2.681383e+00 | 1.000000e+00 | 2.832861e-03 | 2.872858e-04 | 256 | 46 | 2.681374e+00 | 1.000000e+00 | 8.498584e-03 | 5.771001e-04 | 256 | 47 | 2.681353e+00 | 1.000000e+00 | 2.832861e-03 | 3.160871e-04 | 256 | 48 | 2.681334e+00 | 5.000000e-01 | 8.498584e-03 | 1.045502e-03 | 256 | 49 | 2.681314e+00 | 1.000000e+00 | 7.878714e-04 | 1.505118e-03 | 256 | 50 | 2.681306e+00 | 1.000000e+00 | 2.832861e-03 | 4.756894e-04 | 256 | 51 | 2.681301e+00 | 1.000000e+00 | 1.133144e-02 | 3.664873e-04 | 256 | 52 | 2.681288e+00 | 1.000000e+00 | 2.832861e-03 | 1.449821e-04 | 256 | 53 | 2.681287e+00 | 2.500000e-01 | 1.699717e-02 | 2.357176e-04 | 256 | 54 | 2.681282e+00 | 1.000000e+00 | 5.665722e-03 | 2.046663e-04 | 256 | 55 | 2.681278e+00 | 1.000000e+00 | 2.832861e-03 | 2.546349e-04 | 256 | 56 | 2.681276e+00 | 2.500000e-01 | 1.307940e-03 | 1.966786e-04 | 256 | 57 | 2.681274e+00 | 5.000000e-01 | 1.416431e-02 | 1.005310e-04 | 256 | 58 | 2.681271e+00 | 5.000000e-01 | 1.118892e-03 | 1.147324e-04 | 256 | 59 | 2.681269e+00 | 1.000000e+00 | 2.832861e-03 | 1.332914e-04 | 256 | 60 | 2.681268e+00 | 2.500000e-01 | 1.132045e-03 | 5.441369e-05 | 256 | ===============================================================================================|

Estimate the epsilon-insensitive error for the test set using the updated model.

UpdatedL = loss(UpdatedMdl,Xtest,Ytest,'LossFun','epsiloninsensitive')

The regression error decreases by a factor of about 0.08 after resume updates the regression model with more iterations.

Extended Capabilities

expand all

Usage notes and limitations:

The following object functions fully support GPU arrays:
The object functions execute on a GPU if at least one of the following applies:
- The model was fitted with GPU arrays.
- The predictor data that you pass to the object function is a GPU array.
- The response data that you pass to the object function is a GPU array.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2018a

expand all

You can fit a RegressionKernel object with GPU arrays by usingfitrkernel. Most RegressionKernel object functions now support GPU array input arguments so that the functions can execute on a GPU. The object functions that do not support GPU array inputs are incrementalLearner, lime, and shapley.

You can also gather the properties of a RegressionKernel model object from the GPU using the gather function.

The fitckernel and fitrkernel functions support the standardization of numeric predictors. That is, in the call to either function, you can specify the Standardize value as true to center and scale each numeric predictor variable. Kernel models include Mu and Sigma properties that contain the means and standard deviations, respectively, used to standardize the predictors before training. The properties are empty when the fitting function does not perform any standardization.

You can generate C/C++ code for the predict function.