rlContinuousGaussianTransitionFunction - Stochastic Gaussian transition function approximator object for neural network-based

  environment - MATLAB ([original](https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlcontinuousgaussiantransitionfunction.html)) ([raw](?raw))

Stochastic Gaussian transition function approximator object for neural network-based environment

Since R2022a

Description

When creating a neural network-based environment using rlNeuralNetworkEnvironment, you can specify stochastic transition function approximators using rlContinuousDeterministicTransitionFunction objects.

A transition function approximator object uses a deep neural network as internal approximation model to predict the next observations based on the current observations and actions.

To specify deterministic transition function approximators, use rlContinuousGaussianTransitionFunction objects.

Creation

Syntax

Description

`tsnFcnAppx` = rlContinuousGaussianTransitionFunction([net](#mw%5F4ebb99b2-3922-4b6a-9633-295bcbb7509b),`observationInfo`,`actionInfo`,[Name=Value](#namevaluepairarguments)) creates the stochastic transition function approximator objecttsnFcnAppx using the deep neural network net and sets the ObservationInfo and ActionInfo properties.

When creating a stochastic transition function approximator, you must specify the names of the deep neural network inputs and outputs using theObservationInputNames, ActionInputNames,NextObservationMeanOutputNames, andNextObservationStandardDeviationOutputNames name-value pair arguments.

You can also specify the PredictDiff andUseDevice properties using optional name-value pair arguments. For example, to use a GPU for prediction, specify UseDevice="gpu".

example

Input Arguments

expand all

net — Deep neural network

dlnetwork object

Deep neural network, specified as a dlnetwork object.

The input layer names for this network must match the input names specified usingObservationInputNames andActionInputNames. The dimensions of the input layers must match the dimensions of the corresponding observation and action specifications inObservationInfo and ActionInfo, respectively.

The output layer names for this network must match the output names specified using NextObservationOutputNames. The dimensions of the input layers must match the dimensions of the corresponding observation specifications inObservationInfo.

Name-Value Arguments

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: ObservationInputNames="velocity"

ObservationInputNames — Observation input layer names

string | string array

Observation input layer names, specified as a string or string array.

The number of observation input names must match the length ofObservationInfo and the order of the names must match the order of the specifications in ObservationInfo.

ActionInputNames — Action input layer names

string | string array

Action input layer names, specified as a string or string array.

The number of action input names must match the length ofActionInfo and the order of the names must match the order of the specifications in ActionInfo.

NextObservationMeanOutputNames — Next observation mean output layer names

string | string array

Next observation mean output layer names, specified as a string or string array.

The number of next observation mean output names must match the length ofObservationInfo and the order of the names must match the order of the specifications in ObservationInfo.

NextObservationStandardDeviationOutputNames — Next observation standard deviation output layer names

string | string array

Next observation standard deviation output layer names, specified as a string or string array.

The number of next observation standard deviation output names must match the length of ObservationInfo and the order of the names must match the order of the specifications in ObservationInfo.

Properties

expand all

PredictDiff — Option to predict the difference between the current observation and the next observation

false (default) | true

Option to predict the difference between the current observation and the next observation, specified as one of the following logical values.

Example: true

ObservationInfo — Observation specifications

rlNumericSpec object | array of rlNumericSpec objects

Observation specifications, specified as an rlNumericSpec object or an array of such objects. Each element in the array defines the properties of an environment observation channel, such as its dimensions, data type, and name.

When you create the approximator object, the constructor function sets theObservationInfo property to the input argumentobservationInfo.

You can extract observationInfo from an existing environment, function approximator, or agent using getObservationInfo. You can also construct the specifications manually using rlNumericSpec.

Example: [rlNumericSpec([2 1]) rlNumericSpec([1 1])]

ActionInfo — Action specifications

rlFiniteSetSpec object | rlNumericSpec object

Action specifications, specified either as an rlFiniteSetSpec (for discrete action spaces) or rlNumericSpec (for continuous action spaces) object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name.

Note

For this approximator object, only one action channel is allowed.

When you create the approximator object, the constructor function sets theActionInfo property to the input argumentactionInfo.

You can extract ActionInfo from an existing environment or agent using getActionInfo. You can also construct the specifications manually usingrlFiniteSetSpec or rlNumericSpec.

Example: rlNumericSpec([2 1])

Normalization — Normalization method

"none" (default) | string array

Normalization method, returned as an array in which each element (one for each input channel defined in the observationInfo andactionInfo properties, in that order) is one of the following values:

Note

When you specify the Normalization property ofrlAgentInitializationOptions, normalization is applied only to the approximator input channels corresponding to rlNumericSpec specification objects in which both theUpperLimit and LowerLimit properties are defined. After you create the agent, you can use setNormalizer to assign normalizers that use any normalization method. For more information on normalizer objects, see rlNormalizer.

Example: "rescale-symmetric"

UseDevice — Computation device used for training and simulation

"cpu" (default) | "gpu"

Computation device used to perform operations such as gradient computation, parameter update and prediction during training and simulation, specified as either"cpu" or "gpu".

The "gpu" option requires both Parallel Computing Toolbox™ software and a CUDA® enabled NVIDIA® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).

You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB®.

Note

Training or simulating an agent on a GPU involves device-specific numerical round-off errors. Because of these errors, you can get different results on a GPU and on a CPU for the same operation.

To speed up training by using parallel processing over multiple cores, you do not need to use this argument. Instead, when training your agent, use an rlTrainingOptions object in which the UseParallel option is set to true. For more information about training using multicore processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.

Example: "gpu"

Learnables — Learnable parameters of approximator object

cell array of dlarray objects

Learnable parameters of the approximator object, specified as a cell array ofdlarray objects. This property contains the learnable parameters of the approximation model used by the approximator object.

Example: {dlarray(rand(256,4)),dlarray(rand(256,1))}

State — State of approximator object

cell array of dlarray objects

State of the approximator object, specified as a cell array ofdlarray objects. For dlnetwork-based models, this property contains the Value column of theState property table of the dlnetwork model. The elements of the cell array are the state of the recurrent neural network used in the approximator (if any), as well as the state for the batch normalization layer (if used).

For model types that are not based on a dlnetwork object, this property is an empty cell array, since these model types do not support states.

Example: {dlarray(rand(256,1)),dlarray(rand(256,1))}

Object Functions

Examples

collapse all

Create Stochastic Transition Function and Predict Next Observation

Create an environment interface and extract observation and action specifications. Alternatively, you can create specifications using rlNumericSpec and rlFiniteSetSpec.

env = rlPredefinedEnv("CartPole-Continuous"); obsInfo = getObservationInfo(env); actInfo = getActionInfo(env);

Define the layers for the deep neural network. The network has two input channels, one for the current observations and one for the current actions. The output of the network is the predicted Gaussian distribution for each next observation. The two output channels correspond to the means and standard deviations of these distribution.

% Define paths. statePath = featureInputLayer(obsInfo.Dimension(1),Name="obs"); actionPath = featureInputLayer(actInfo.Dimension(1),Name="act"); commonPath = [ concatenationLayer(1,2,Name="concat") fullyConnectedLayer(32,Name="fc") reluLayer(Name="CriticRelu1") fullyConnectedLayer(32,Name="fc2") ]; meanPath = [ reluLayer(Name="nextObsMeanRelu") fullyConnectedLayer(obsInfo.Dimension(1),Name="nextObsMean") ]; stdPath = [ reluLayer(Name="nextObsStdRelu") fullyConnectedLayer(obsInfo.Dimension(1),Name="nextObsStdReluFull") softplusLayer(Name="nextObsStd") ];

% Create dlnetwork object and add layers tsnNet = dlnetwork; tsnNet = addLayers(tsnNet,statePath); tsnNet = addLayers(tsnNet,actionPath); tsnNet = addLayers(tsnNet,commonPath); tsnNet = addLayers(tsnNet,meanPath); tsnNet = addLayers(tsnNet,stdPath);

% Connect paths. tsnNet = connectLayers(tsnNet,"obs","concat/in1"); tsnNet = connectLayers(tsnNet,"act","concat/in2"); tsnNet = connectLayers(tsnNet,"fc2","nextObsMeanRelu"); tsnNet = connectLayers(tsnNet,"fc2","nextObsStdRelu");

% Plot network. plot(tsnNet)

Figure contains an axes object. The axes object contains an object of type graphplot.

Initialize network and display the number of weights.

tsnNet = initialize(tsnNet); summary(tsnNet)

Initialized: true

Number of learnables: 1.5k

Inputs: 1 'obs' 4 features 2 'act' 1 features

Create a stochastic transition function object.

tsnFcnAppx = rlContinuousGaussianTransitionFunction(tsnNet, ... obsInfo,actInfo, ... ObservationInputNames="obs",... ActionInputNames="act",... NextObservationMeanOutputNames="nextObsMean",... NextObservationStandardDeviationOutputNames="nextObsStd");

Using this transition function object, you can predict the next observation based on the current observation and action. For example, predict the next observation for a random observation and action. The next observation values are sampled from Gaussian distributions with the means and standard deviations output by the transition network.

observation = rand(obsInfo.Dimension); action = rand(actInfo.Dimension); nextObs = predict(tsnFcnAppx,{observation},{action})

nextObs = 1x1 cell array {4x1 single}

ans = 4x1 single column vector

1.2414
0.7307

-0.5588 -0.9567

You can also obtain the mean value and standard deviation of the Gaussian distribution of the predicted next observation using evaluate.

nextObsDist = evaluate(tsnFcnAppx,{observation,action})

nextObsDist=1×2 cell array {4x1 single} {4x1 single}

Version History

Introduced in R2022a

See Also

Functions

Objects

Topics