predict - Predict response of Gaussian process regression model - MATLAB (original) (raw)
Predict response of Gaussian process regression model
Syntax
Description
[ypred](#buto5ym-ypred) = predict([gprMdl](#buto5ym-gprMdl),[Xnew](#buto5ym-Xnew))
returns the predicted responses ypred
for the Gaussian process regression (GPR) model gprMdl
and the predictor values inXnew
.
[[ypred](#buto5ym-ypred),[ysd](#buto5ym-ysd),[yint](#buto5ym-yint)] = predict([gprMdl](#buto5ym-gprMdl),[Xnew](#buto5ym-Xnew))
also returns the standard deviations ysd
and 95% prediction intervalsyint
of the response variable, evaluated at each observation in Xnew
using the trained GPR model.
[[ypred](#buto5ym-ypred),[ysd](#buto5ym-ysd),[yint](#buto5ym-yint)] = predict([gprMdl](#buto5ym-gprMdl),[Xnew](#buto5ym-Xnew),[Name,Value](#namevaluepairarguments))
specifies additional options using one or more name-value arguments. For example, specify the significance level for the confidence level of the prediction intervalsyint
.
Examples
Generate the sample data.
n = 10000; rng(1) % For reproducibility x = linspace(0.5,2.5,n)'; y = sin(10*pi.*x) ./ (2.x)+(x-1).^4 + 1.5rand(n,1);
Fit a GPR model using the Matern 3/2 kernel function with separate length scale for each predictor and an active set size of 100. Use the subset of regressors approximation method for parameter estimation and fully independent conditional method for prediction.
gprMdl = fitrgp(x,y,'KernelFunction','ardmatern32', ... 'ActiveSetSize',100,'FitMethod','sr','PredictMethod','fic');
Compute the predictions.
[ypred,~,yci] = predict(gprMdl,x);
Plot the data along with the predictions and prediction intervals.
plot(x,y,'r.') hold on plot(x,ypred,'b-') plot(x,yci(:,1),'k--') plot(x,yci(:,2),'k--') xlabel('x') ylabel('y') legend('True responses','GPR predictions', ... 'Prediction interval limits','Location','best')
Load the sample data and store in a table
.
load fisheriris tbl = table(meas(:,1),meas(:,2),meas(:,3),meas(:,4),species,... 'VariableNames',{'meas1','meas2','meas3','meas4','species'});
Fit a GPR model using the first measurement as the response and the other variables as the predictors.
mdl = fitrgp(tbl,'meas1');
Compute the predictions and the 99% confidence intervals.
[ypred,~,yci] = predict(mdl,tbl,'Alpha',0.01);
Plot the true response and the predictions along with the prediction intervals.
figure(); plot(mdl.Y,'r.'); hold on; plot(ypred); plot(yci(:,1),'k:'); plot(yci(:,2),'k:'); legend('True response','GPR predictions',... 'Lower prediction limit','Upper prediction limit',... 'Location','Best');
Load the sample data.
The data contains training and test data. There are 500 observations in training data and 100 observations in test data. The data has 6 predictor variables. This is simulated data.
Fit a GPR model using the squared exponential kernel function with a separate length scale for each predictor. Standardize predictors in the training data. Use the exact fitting and prediction methods.
gprMdl = fitrgp(Xtrain,ytrain,'Basis','constant',... 'FitMethod','exact','PredictMethod','exact',... 'KernelFunction','ardsquaredexponential','Standardize',1);
Predict the responses for test data.
[ytestpred,~,ytestci] = predict(gprMdl,Xtest);
Plot the test response along with the predictions.
figure; plot(ytest,'r'); hold on; plot(ytestpred,'b'); plot(ytestci(:,1),'k:'); plot(ytestci(:,2),'k:'); legend('Actual response','GPR predictions',... '95% lower','95% upper','Location','Best'); hold off
Input Arguments
Gaussian process regression model, specified as aRegressionGP
(full) orCompactRegressionGP
(compact) object.
New values for the predictors that fitrgp uses in training the GPR model, specified as a table
or an_m_-by-d matrix.m is the number of observations and_d_ is the number of predictor variables in the training data.
If you trained gprMdl on a table
, then Xnew
must be a table
that contains all the predictor variables used to traingprMdl
.
If you trained gprMdl
on a matrix, thenXnew
must be a numeric matrix with_d_ columns.
Data Types: single
| double
| table
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: predict(grpMdl,Xnew,"Alpha",0.1)
specifies the confidence level of the prediction intervals to be 90%.
Significance level for the confidence level of the prediction intervals yint, specified as a numeric scalar in the range [0,1]
. The confidence level ofyint
is equal to100(1 – Alpha)%
.
Example: 'Alpha',0.01
specifies to return 99% prediction intervals.
Data Types: single
| double
Since R2023b
Predicted response value to use for observations with missing predictor values, specified as "median"
, "mean"
, or a numeric scalar.
Value | Description |
---|---|
"median" | predict uses the median of the observed response values in the training data as the predicted response value for observations with missing predictor values. |
"mean" | predict uses the mean of the observed response values in the training data as the predicted response value for observations with missing predictor values. |
Numeric scalar | predict uses this value as the predicted response value for observations with missing predictor values. |
Example: PredictionForMissingValue="mean"
Example: PredictionForMissingValue=NaN
Data Types: single
| double
| char
| string
Output Arguments
Predicted responses, returned as a column vector of length_n_, where n is the number of observations in the predictor data Xnew.
Standard deviations of the response variable, evaluated at each observation in the predictor data Xnew, returned as a column vector of length n, where n is the number of observations in Xnew
. Thei
th element ysd(i)
contains the standard deviation of the i
th response for thei
th observation`Xnew`(i,:)
, estimated using the trained GPR model gprMdl.
Prediction intervals of the response variable, evaluated at each observation in the predictor data Xnew, returned as an_n_-by-2 matrix, where n is the number of observations in Xnew
. Thei
th row yint(i,:)
contains the100(1 – [Alpha](#buto5ym-Alpha))%
prediction interval of the i
th response for thei
th observation`Xnew`(i,:)
. TheAlpha
value is the probability that the prediction interval does not contain the true response value for`Xnew`(i,:)
. The first column of yint
contains the lower limits of the prediction intervals, and the second column contains the upper limits.
Tips
- You can choose the prediction method while training the GPR model using thePredictMethod name-value pair argument in fitrgp. The default prediction method is
'exact'
for n ≤ 10000, where_n_ is the number of observations in the training data, and'bcd'
(block coordinate descent), otherwise. - Computation of standard deviations, ysd, and prediction intervals, yint, is not supported when
PredictMethod
is'bcd'
. - If gprMdl is a
CompactRegressionGP
object, you cannot compute standard deviations,ysd
, or prediction intervals,yint
, forPredictMethod
equal to'sr'
or'fic'
. To computeysd
andyint
forPredictMethod
equal to'sr'
or'fic'
, use the full regression (RegressionGP
) object.
Alternatives
You can use resubPredict to compute the predicted responses for the trained GPR model at the observations in the training data.
Simulink Block
To integrate the prediction of a Gaussian process regression model into Simulink®, you can use the RegressionGP Predict block in the Statistics and Machine Learning Toolbox™ library or a MATLAB® Function block with the predict
function. For examples, see Predict Responses Using RegressionGP Predict Block and Predict Class Labels Using MATLAB Function Block.
When deciding which approach to use, consider the following:
- If you use the Statistics and Machine Learning Toolbox library block, you can use the Fixed-Point Tool (Fixed-Point Designer) to convert a floating-point model to fixed point.
- Support for variable-size arrays must be enabled for a MATLAB Function block with the
predict
function. - If you use a MATLAB Function block, you can use MATLAB functions for preprocessing or post-processing before or after predictions in the same MATLAB Function block.
Extended Capabilities
Thepredict
function fully supports tall arrays. For more information, see Tall Arrays.
Usage notes and limitations:
- Use saveLearnerForCoder, loadLearnerForCoder, and codegen (MATLAB Coder) to generate code for the
predict
function. Save a trained model by usingsaveLearnerForCoder
. Define an entry-point function that loads the saved model by usingloadLearnerForCoder
and calls thepredict
function. Then usecodegen
to generate code for the entry-point function. - For single-precision code generation, use standardized data by specifying
'Standardize',true
when you train the model. To generate single-precision C/C++ code forpredict
, specifyDataType="single"
when you call the loadLearnerForCoder function. - This table contains notes about the arguments of
predict
. Arguments not included in this table are fully supported.Argument Notes and Limitations Mdl For the usage notes and limitations of the model object, see Code Generation of the CompactRegressionGP object. Xnew Xnew must be a single-precision or double-precision matrix or a table containing numeric variables, categorical variables, or both.The number of rows, or observations, inXnew can be a variable size, but the number of columns inXnew must be fixed.If you want to specify Xnew as a table, then your model must be trained using a table, and you must ensure that your entry-point function for prediction: Accepts data as arraysCreates a table from the data input arguments and specifies the variable names in the tablePasses the table to predictFor an example of this table workflow, see Generate Code to Classify Data in Table. For more information on using tables in code generation, see Code Generation for Tables (MATLAB Coder) and Table Limitations for Code Generation (MATLAB Coder). Name-value arguments Names in name-value arguments must be compile-time constants. For example, to allow a user-defined significance level in the generated code, include {coder.Constant('Alpha'),0} in the -args value ofcodegen (MATLAB Coder).If the value of PredictionForMissingValue is nonnumeric, then it must be a compile-time constant.
For more information, see Introduction to Code Generation.
Version History
Introduced in R2015b
Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue
name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.
This table lists the object functions that support thePredictionForMissingValue
name-value argument. By default, the functions use the training set median as the predicted response value for observations with missing predictor values.
Model Type | Model Objects | Object Functions |
---|---|---|
Gaussian process regression (GPR) model | RegressionGP, CompactRegressionGP | loss, predict, resubLoss, resubPredict |
RegressionPartitionedGP | kfoldLoss, kfoldPredict | |
Gaussian kernel regression model | RegressionKernel | loss, predict |
RegressionPartitionedKernel | kfoldLoss, kfoldPredict | |
Linear regression model | RegressionLinear | loss, predict |
RegressionPartitionedLinear | kfoldLoss, kfoldPredict | |
Neural network regression model | RegressionNeuralNetwork, CompactRegressionNeuralNetwork | loss, predict, resubLoss, resubPredict |
RegressionPartitionedNeuralNetwork | kfoldLoss, kfoldPredict | |
Support vector machine (SVM) regression model | RegressionSVM, CompactRegressionSVM | loss, predict, resubLoss, resubPredict |
RegressionPartitionedSVM | kfoldLoss, kfoldPredict |
In previous releases, the regression model loss
and predict
functions listed above used NaN
predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.