cvshrink - Cross-validate pruning and regularization of regression ensemble - MATLAB (original) (raw)
Cross-validate pruning and regularization of regression ensemble
Syntax
Description
[vals](#mw%5F8a50eb1f-757e-45f6-b4b6-34a691f1c931) = cvshrink([ens](#bst9qha-1%5Fsep%5Fmw%5Fdb6ecd40-08eb-4956-a4a2-5fc22e0fa24f))
returns an L
-by-T
matrix with cross-validated values of the mean squared error. L
is the number ofLambda
values in the ens.Regularization
structure. T
is the number ofThreshold values on weak learner weights. Ifens
does not have a Regularization
property containing values specified by the regularize function, set the Lambda name-value argument.
[[vals](#mw%5F8a50eb1f-757e-45f6-b4b6-34a691f1c931),[nlearn](#mw%5Faceb54d9-8fa4-4aa1-ad38-ae10de2e4c59)] = cvshrink([ens](#bst9qha-1%5Fsep%5Fmw%5Fdb6ecd40-08eb-4956-a4a2-5fc22e0fa24f))
additionally returns anL
-by-T
matrix of the mean number of learners in the cross-validated ensemble.
[___] = cvshrink([ens](#bst9qha-1%5Fsep%5Fmw%5Fdb6ecd40-08eb-4956-a4a2-5fc22e0fa24f),[Name=Value](#namevaluepairarguments))
specifies additional options using one or more name-value arguments. For example, you can specify the number of folds to use, the fraction of data to use for holdout validation, and lower cutoffs on weights for weak learners.
Examples
Create a regression ensemble for predicting mileage from the carsmall
data. Cross-validate the ensemble.
Load the carsmall
data set and select displacement, horsepower, and vehicle weight as predictors.
load carsmall X = [Displacement Horsepower Weight];
You can train an ensemble of bagged regression trees.
ens = fitrensemble(X,Y,Method="Bag")
fircensemble
uses a default template tree object templateTree()
as a weak learner when 'Method'
is 'Bag'
. In this example, for reproducibility, specify 'Reproducible',true
when you create a tree template object, and then use the object as a weak learner.
rng('default') % For reproducibility t = templateTree(Reproducible=true); % For reproducibiliy of random predictor selections ens = fitrensemble(X,MPG,Method="Bag",Learners=t);
Specify values for Lambda
and Threshold
. Use these values to cross-validate the ensemble.
[vals,nlearn] = cvshrink(ens,Lambda=[.01 .1 1],Threshold=[0 .01 .1])
vals = 3×3
18.9150 19.0092 128.5935 18.9099 18.9504 128.8449 19.0328 18.9636 116.8500
nlearn = 3×3
13.7000 11.6000 4.1000 13.7000 11.7000 4.1000 13.9000 11.6000 4.1000
Clearly, setting a threshold of 0.1
leads to unacceptable errors, while a threshold of 0.01
gives similar errors to a threshold of 0
. The mean number of learners with a threshold of 0.01
is about 11.4
, whereas the mean number is about 13.8
when the threshold is 0
.
Input Arguments
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: cvshrink(ens,Holdout=0.1,Threshold=[0 .01 .1])
specifies to reserve 10% of the data for holdout validation, and weight cutoffs of 0, 0.01, and 1 for the first, second, and third weak learners, respectively.
Data Types: double
| single
Data Types: single
| double
Regularization parameter values for lasso, specified as a vector of nonnegative scalar values. If the value of this argument is empty,cvshrink
does not perform cross-validation.
Example: Lambda=[.01 .1 1]
Data Types: single
| double
Data Types: char
| string
Weights threshold, specified as a numeric vector with lower cutoffs on weights for weak learners. cvshrink
discards learners with weights below Threshold
in its cross-validation calculation.
Example: Threshold=[0 .01 .1]
Data Types: single
| double
Output Arguments
Cross-validated values of the mean squared error, returned as anL
-by-T
numeric matrix.L
is the number of values of the regularization parameter Lambda
, and T
is the number of Threshold
values on weak learner weights.
Mean number of learners in the cross-validated ensemble, returned as anL
-by-T
numeric matrix.L
is the number of values of the regularization parameter Lambda
, and T
is the number of Threshold
values on weak learner weights.
Extended Capabilities
Version History
Introduced in R2011a