RegressionTree - Regression tree - MATLAB (original) (raw)

Description

A decision tree with binary splits for regression. An object of class RegressionTree can predict responses for new data with thepredict method. The object contains the data used for training, so can compute resubstitution predictions using resubPredict.

Creation

Create a RegressionTree object by using fitrtree.

Properties

expand all

Tree Properties

This property is read-only.

Categorical splits, returned as an n-by-2 cell array, wheren is the number of categorical splits intree. Each row in CategoricalSplit gives left and right values for a categorical split. For each branch node with categorical split j based on a categorical predictor variablez, the left child is chosen if z is inCategoricalSplit(j,1) and the right child is chosen ifz is in CategoricalSplit(j,2). The splits are in the same order as nodes of the tree. Nodes for these splits can be found by runningcuttype and selecting 'categorical' cuts from top to bottom.

Data Types: cell

This property is read-only.

Numbers of the child nodes for each node in the tree, returned as ann-by-2 array, where n is the number of nodes. Leaf nodes have child node 0.

Data Types: double

This property is read-only.

Categories used at branches in tree, returned as ann-by-2 cell array, where n is the number of nodes. For each branch node i based on a categorical predictor variable X, the left child is chosen if X is among the categories listed in CutCategories{i,1}, and the right child is chosen if X is among those listed inCutCategories{i,2}. Both columns ofCutCategories are empty for branch nodes based on continuous predictors and for leaf nodes.

CutPoint contains the cut points for'continuous' cuts, and CutCategories contains the set of categories.

Data Types: cell

This property is read-only.

Data Types: double

This property is read-only.

Names of the variables used for branching in each node in tree, returned as an n-element cell array, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, CutPredictor contains an empty character vector.

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories.

Data Types: cell

This property is read-only.

Indices of variables used for branching in each node in tree, returned as an n-element array, where n is the number of nodes. For more information, see CutPredictor.

Data Types: double

This property is read-only.

Type of cut at each node in tree, returned as ann-element cell array, where n is the number of nodes. For each node i, CutType{i} is:

'continuous' — If the cut is defined in the form X < v for a variable X and cut pointv.
'categorical' — If the cut is defined by whether a variableX takes a value in a set of categories.
'' — If i is a leaf node.

CutPoint contains the cut points for'continuous' cuts, and CutCategories contains the set of categories.

Data Types: cell

This property is read-only.

Indicator of branch nodes, returned as an n-element logical vector that is true for each branch node and false for each leaf node of tree.

Data Types: logical

This property is read-only.

Parameters used in training tree, returned as aTreeParams object. To display all parameter values, enter tree.ModelParameters. To access a particular parameter, use dot notation.

This property is read-only.

Mean squared error for each node in tree, returned as an n-element vector, where n is the number of nodes in the tree.

Data Types: double

This property is read-only.

Mean observation values for each node in tree, returned as ann-element vector, where n is the number of nodes in the tree. Every element in NodeMean is the average of the true Y values over all observations in the node.

Data Types: double

This property is read-only.

Proportion of observations in original data that satisfy the conditions for each node in tree, returned as an n-element vector, where n is the number of nodes in the tree.

Data Types: double

This property is read-only.

Risk of each node in tree, returned as an n-element vector, where n is the number of nodes in the tree. The risk for each node is the node error weighted by the node probability.

Data Types: double

This property is read-only.

Size of the nodes in tree, returned as an n-element vector, where n is the number of nodes in the tree. The size of a node is the number of observations from the data used to create the tree that satisfy the conditions for the node.

Data Types: double

This property is read-only.

The number of nodes in tree, returned as a positive integer.

Data Types: double

This property is read-only.

Number of parents of each node in tree, returned as an n-element integer vector, where n is the number of nodes in the tree. The parent of the root node is 0.

Data Types: double

Alpha values for pruning the tree, returned as a real vector with one element per pruning level. If the pruning level ranges from 0 to M, then PruneAlpha has M + 1 elements sorted in ascending order. PruneAlpha(1) is for pruning level 0 (no pruning), PruneAlpha(2) is for pruning level 1, and so on.

For the meaning of the ɑ values, see How Decision Trees Create a Pruning Sequence.

Data Types: double

Pruning levels of each node in the tree, returned as an integer vector with NumNodes elements. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node.

For details, see Pruning.

Data Types: double

This property is read-only.

Categories used for surrogate splits, returned as an n-element cell array, where n is the number of nodes in tree. For each node k, SurrogateCutCategories{k} is a cell array. The length of SurrogateCutCategories{k} is equal to the number of surrogate predictors found at this node. Every element ofSurrogateCutCategories{k} is either an empty character vector for a continuous surrogate predictor, or is a two-element cell array with categories for a categorical surrogate predictor. The first element of this two-element cell array lists categories assigned to the left child by this surrogate split and the second element of this two-element cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutVar. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes,SurrogateCutCategories contains an empty cell.

Data Types: cell

This property is read-only.

Numeric cut assignments used for surrogate splits in tree, returned as an n-element cell array, where n is the number of nodes in tree. For each node k, SurrogateCutFlip{k} is a numeric vector. The length of SurrogateCutFlip{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutFlip{k} is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z<C and the cut assignment for this surrogate split is +1, or if Z_≥_C and the cut assignment for this surrogate split is –1. Similarly, the right child is chosen if Z_≥_C and the cut assignment for this surrogate split is +1, or if Z<C and the cut assignment for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutFlip contains an empty array.

Data Types: cell

This property is read-only.

Numeric values used for surrogate splits in tree, returned as ann-element cell array, where n is the number of nodes in tree. For each node k,SurrogateCutPoint{k} is a numeric vector. The length ofSurrogateCutPoint{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutPoint{k} is either NaN for a categorical surrogate predictor, or a numeric cut for a continuous surrogate predictor. For every surrogate split with a numeric cut_C_ based on a continuous predictor variable Z, the left child is chosen if Z<C and SurrogateCutFlip for this surrogate split is +1, or if Z_≥_C andSurrogateCutFlip for this surrogate split is –1. Similarly, the right child is chosen if Z_≥_C and SurrogateCutFlip for this surrogate split is +1, or if Z<C and SurrogateCutFlip for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables returned by SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes,SurrogateCutPoint contains an empty cell.

Data Types: cell

This property is read-only.

Names of the variables used for surrogate splits in each node intree, returned as an n-element cell array, where n is the number of nodes in tree. Every element of SurrogateCutPredictor is a cell array with the names of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes,SurrogateCutPredictor contains an empty cell.

Data Types: cell

This property is read-only.

Types of surrogate splits at each node in tree, returned as ann-element cell array, where n is the number of nodes in tree. For each node k,SurrogateCutType{k} is a cell array with the types of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The order of the surrogate split variables at each node is matched to the order of variables inSurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutType contains an empty cell. A surrogate split type can be either'continuous' if the cut is defined in the formZ<V for a variable Z and cut point V or 'categorical' if the cut is defined by whether Z takes a value in a set of categories.

Data Types: cell

This property is read-only.

Predictive measures of association for surrogate splits in tree, returned as an n-element cell array, where n is the number of nodes in tree. For each node k, SurrogatePredictorAssociation{k} is a numeric vector. The length of SurrogatePredictorAssociation{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogatePredictorAssociation{k} gives the predictive measure of association between the optimal split and this surrogate split. The order of the surrogate split variables at each node is the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogatePredictorAssociation contains an empty cell.

Data Types: cell

Predictor Properties

This property is read-only.

Data Types: cell

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

This property is read-only.

Expanded predictor names, returned as a cell array of character vectors.

If the model uses encoding for categorical variables, thenExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same asPredictorNames.

Data Types: cell

This property is read-only.

Predictor names, specified as a cell array of character vectors. The order of the entries in PredictorNames is the same as in the training data.

Data Types: cell

This property is read-only.

Predictor values, returned as a real matrix or table. Each column ofX represents one variable (predictor), and each row represents one observation.

Data Types: double | table

Response Properties

This property is read-only.

Name of the response variable, returned as a character vector.

Data Types: char

Function for transforming the raw response values (mean squared error), specified as a function handle or 'none'. The default 'none' means no transformation; equivalently, 'none' means @(x)x. A function handle must accept a matrix of response values and return a matrix of the same size.

Add or change a ResponseTransform function using dot notation:

tree.ResponseTransform = @function

Data Types: char | function_handle

This property is read-only.

Response data, returned as a numeric column vector with the same number of rows asX. Each entry in Y is the response to the data in the corresponding row of X.

Data Types: double

Other Data Properties

This property is read-only.

Number of observations in the training data, returned as a positive integer.NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

This property is read-only.

Rows of the original predictor data X used for fitting, returned as an n-element logical vector, where n is the number of rows of X. If the software uses all rows of X to create the object, then RowsUsed is an empty array ([]).

Data Types: logical

This property is read-only.

Scaled weights in tree, returned as a numeric vector.W has length n, the number of rows in the training data.

Data Types: double

Object Functions

compact	Reduce size of machine learning model
crossval	Cross-validate machine learning model
cvloss	Regression error by cross-validation for regression tree model
gather	Gather properties of Statistics and Machine Learning Toolbox object from GPU
lime	Local interpretable model-agnostic explanations (LIME)
loss	Regression error for regression tree model
nodeVariableRange	Retrieve variable range of decision tree node
partialDependence	Compute partial dependence
plotPartialDependence	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predict	Predict responses using regression tree model
predictorImportance	Estimates of predictor importance for regression tree
prune	Produce sequence of regression subtrees by pruning regression tree
resubLoss	Resubstitution loss for regression tree model
resubPredict	Predict response of regression tree by resubstitution
shapley	Shapley values
surrogateAssociation	Mean predictive measure of association for surrogate splits in regression tree
view	View regression tree

Examples

collapse all

Load the sample data.

Construct a regression tree using the sample data. The response variable is miles per gallon, MPG.

tree = fitrtree([Weight, Cylinders],MPG,... 'CategoricalPredictors',2,'MinParentSize',20,... 'PredictorNames',{'W','C'})

tree = RegressionTree PredictorNames: {'W' 'C'} ResponseName: 'Y' CategoricalPredictors: 2 ResponseTransform: 'none' NumObservations: 94

Properties, Methods

Predict the mileage of 4,000-pound cars with 4, 6, and 8 cylinders.

MPG4Kpred = predict(tree,[4000 4; 4000 6; 4000 8])

MPG4Kpred = 3×1

19.2778 19.2778 14.3889

References

[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone.Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.

Extended Capabilities

expand all

Usage notes and limitations:

The predict and update functions support code generation.
To integrate the prediction of a regression tree model into Simulink®, you can use the RegressionTree Predict block in the Statistics and Machine Learning Toolbox™ library or a MATLAB® Function block with the predict function.
When you train a regression tree model by using fitrtree, the following restrictions apply.
- The value of the ResponseTransform name-value argument cannot be an anonymous function. For fixed-point code generation, the value must be'none' (default).
- You cannot use surrogate splits; that is, the value of the Surrogate name-value argument must be 'off'.
- Fixed-point code generation and code generation with a coder configurer do not support categorical predictors (logical, categorical, char,string, or cell). You cannot use theCategoricalPredictors name-value argument. To include categorical predictors in a model, preprocess them by using dummyvar before fitting the model.

For more information, see Introduction to Code Generation.

Usage notes and limitations:

The following object functions fully support GPU arrays:
The following object functions offer limited support for GPU arrays:
The object functions execute on a GPU if at least one of the following applies:
- The model was fitted with GPU arrays.
- The predictor data that you pass to the object function is a GPU array.
- The response data that you pass to the object function is a GPU array.

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2011a