rlContinuousDeterministicActor - Deterministic actor with a continuous action space for reinforcement learning
agents - MATLAB ([original](https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlcontinuousdeterministicactor.html)) ([raw](?raw))
Deterministic actor with a continuous action space for reinforcement learning agents
Since R2022a
Description
This object implements a function approximator to be used as a deterministic actor within a reinforcement learning agent with a continuous action space. A continuous deterministic actor takes an environment observation as input and returns as output an action that is a parametrized deterministic function of the observation, thereby implementing a parametrized deterministic policy. After you create anrlContinuousDeterministicActor
object, use it to create a suitable agent, such as rlDDPGAgent. For more information on creating actors and critics, see Create Policies and Value Functions.
Creation
Syntax
Description
`actor` = rlContinuousDeterministicActor([net](#mw%5Feb72196c-8f32-44a0-b8d3-b33d62459ae2),`observationInfo`,`actionInfo`)
creates a continuous deterministic actor object using the deep neural networknet
as underlying approximation model. For this actor,actionInfo
must specify a continuous action space. The network input layers are automatically associated with the environment observation channels according to the dimension specifications in observationInfo
. The network must have a single output layer with the same data type and dimensions as the action specified in actionInfo
. This function sets theObservationInfo
and ActionInfo
properties ofactor
to the observationInfo
andactionInfo
input arguments, respectively.
`actor` = rlContinuousDeterministicActor({[basisFcn](#mw%5F77516352-55a4-4560-aaa8-3b8aac5e91ad),[W0](#mw%5Ff879473c-64fa-4c8e-b309-75d415ebbec3)},`observationInfo`,`actionInfo`)
creates a continuous deterministic actor object using a custom basis function as underlying approximation model. The first input argument is a two-element cell array whose first element is the handle basisFcn
to a custom basis function and whose second element is the initial weight vector W0
. This function sets the ObservationInfo
andActionInfo
properties of actor
to theobservationInfo
and actionInfo
input arguments, respectively.
`actor` = rlContinuousDeterministicActor(___,[Name=Value](#namevaluepairarguments))
specifies names of the observation input layers (for network-based approximators) or sets the UseDevice property using one or more name-value arguments. Specifying the input layer names allows you explicitly associate the layers of your network approximator with specific environment channels. For all types of approximators, you can specify the device where computations for actor
are executed, for example UseDevice="gpu"
.
Input Arguments
net
— Deep neural network
array of Layer
objects | layerGraph
object | DAGNetwork
object | SeriesNetwork
object | dlNetwork
object (preferred)
Deep neural network used as the underlying approximation model within the actor, specified as one of the following:
- Array of
Layer
objects - layerGraph object
- DAGNetwork object
- SeriesNetwork object
- dlnetwork object
Note
Among the different network representation options, dlnetwork is preferred, since it has built-in validation checks and supports automatic differentiation. If you pass another network object as an input argument, it is internally converted to adlnetwork
object. However, best practice is to convert other representations to dlnetwork
explicitly before using it to create a critic or an actor for a reinforcement learning agent. You can do so using dlnet=dlnetwork(net)
, where net
is any Deep Learning Toolbox™ neural network object. The resulting dlnet
is thedlnetwork
object that you use for your critic or actor. This practice allows a greater level of insight and control for cases in which the conversion is not straightforward and might require additional specifications.
The network must have as many input layers as the number of environment observation channels (with each input layer receiving input from an observation channel), and a single output layer returning the action.
rlContinuousDeterministicActor
objects support recurrent deep neural networks. For an example, see Create Deterministic Actor from Recurrent Neural Network.
The learnable parameters of the actor are the weights of the deep neural network. For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.
basisFcn
— Custom basis function
function handle
Custom basis function, specified as a function handle to a user-defined MATLAB function. The user defined function can either be an anonymous function or a function on the MATLAB path. The action to be taken based on the current observation, which is the output of the actor, is the vector a = W'*B
, whereW
is a weight matrix containing the learnable parameters andB
is the column vector returned by the custom basis function.
Your basis function must have the following signature.
B = myBasisFunction(obs1,obs2,...,obsN)
Here, obs1
to obsN
are inputs in the same order and with the same data type and dimensions as the environment observation channels defined in observationInfo
.
Example: @(obs1,obs2,obs3) [obs3(2)*obs1(1)^2; abs(obs2(5)+obs3(1))]
W0
— Initial value of basis function weights
matrix
Initial value of the basis function weights W
, specified as a matrix having as many rows as the length of the vector returned by the basis function and as many columns as the dimension of the action space.
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Example: UseDevice="gpu"
ObservationInputNames
— Network input layers names corresponding to the environment observation channels
string array | cell array of strings | cell array of character vectors
Network input layers names corresponding to the environment observation channels, specified as a string array or a cell array of strings or character vectors. The function assigns, in sequential order, each environment observation channel specified inobservationInfo
to each layer whose name is specified in the array assigned to this argument. Therefore, the specified network input layers, ordered as indicated in this argument, must have the same data type and dimensions as the observation channels, as ordered in observationInfo
.
This name-value argument is supported only when the approximation model is a deep neural network.
Example: ObservationInputNames={"obsInLyr1_airspeed","obsInLyr2_altitude"}
Properties
ObservationInfo
— Observation specifications
rlFiniteSetSpec
object | rlNumericSpec
object | array
Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array containing a mix of such objects. Each element in the array defines the properties of an environment observation channel, such as its dimensions, data type, and name.
When you create the approximator object, the constructor function sets theObservationInfo
property to the input argumentobservationInfo
.
You can extract observationInfo
from an existing environment, function approximator, or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.
Example: [rlNumericSpec([2 1]) rlFiniteSetSpec([3,5,7])]
ActionInfo
— Action specifications
rlNumericSpec
object
Action specifications, specified as an rlNumericSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name.
Note
For this approximator object, only one action channel is allowed.
When you create the approximator object, the constructor function sets theActionInfo
property to the input argumentactionInfo
.
You can extract ActionInfo
from an existing environment, approximator object, or agent using getActionInfo. You can also construct the specification manually usingrlNumericSpec.
Example: rlNumericSpec([2 1])
Normalization
— Normalization method
"none"
(default) | string array
Normalization method, returned as an array in which each element (one for each input channel defined in the observationInfo
andactionInfo
properties, in that order) is one of the following values:
"none"
— Do not normalize the input."rescale-zero-one"
— Normalize the input by rescaling it to the interval between 0 and 1. The normalized input Y is (_U_–Min
)./(UpperLimit
–LowerLimit
), where U is the nonnormalized input. Note that nonnormalized input values lower thanLowerLimit
result in normalized values lower than 0. Similarly, nonnormalized input values higher thanUpperLimit
result in normalized values higher than 1. Here,UpperLimit
andLowerLimit
are the corresponding properties defined in the specification object of the input channel."rescale-symmetric"
— Normalize the input by rescaling it to the interval between –1 and 1. The normalized input Y is 2(_U_–LowerLimit
)./(UpperLimit
–LowerLimit
) – 1, where U is the nonnormalized input. Note that nonnormalized input values lower thanLowerLimit
result in normalized values lower than –1. Similarly, nonnormalized input values higher thanUpperLimit
result in normalized values higher than 1. Here,UpperLimit
andLowerLimit
are the corresponding properties defined in the specification object of the input channel.
Note
When you specify the Normalization
property ofrlAgentInitializationOptions
, normalization is applied only to the approximator input channels corresponding to rlNumericSpec specification objects in which both theUpperLimit
and LowerLimit
properties are defined. After you create the agent, you can use setNormalizer to assign normalizers that use any normalization method. For more information on normalizer objects, see rlNormalizer.
Example: "rescale-symmetric"
UseDevice
— Computation device used for training and simulation
"cpu"
(default) | "gpu"
Computation device used to perform operations such as gradient computation, parameter update and prediction during training and simulation, specified as either"cpu"
or "gpu"
.
The "gpu"
option requires both Parallel Computing Toolbox™ software and a CUDA® enabled NVIDIA® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).
You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB®.
Note
Training or simulating an agent on a GPU involves device-specific numerical round-off errors. Because of these errors, you can get different results on a GPU and on a CPU for the same operation.
To speed up training by using parallel processing over multiple cores, you do not need to use this argument. Instead, when training your agent, use an rlTrainingOptions object in which the UseParallel
option is set to true
. For more information about training using multicore processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.
Example: "gpu"
Learnables
— Learnable parameters of approximator object
cell array of dlarray
objects
Learnable parameters of the approximator object, specified as a cell array ofdlarray
objects. This property contains the learnable parameters of the approximation model used by the approximator object.
Example: {dlarray(rand(256,4)),dlarray(rand(256,1))}
State
— State of approximator object
cell array of dlarray
objects
State of the approximator object, specified as a cell array ofdlarray
objects. For dlnetwork
-based models, this property contains the Value
column of theState
property table of the dlnetwork
model. The elements of the cell array are the state of the recurrent neural network used in the approximator (if any), as well as the state for the batch normalization layer (if used).
For model types that are not based on a dlnetwork
object, this property is an empty cell array, since these model types do not support states.
Example: {dlarray(rand(256,1)),dlarray(rand(256,1))}
Object Functions
rlDDPGAgent | Deep deterministic policy gradient (DDPG) reinforcement learning agent |
---|---|
rlTD3Agent | Twin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent |
getAction | Obtain action from agent, actor, or policy object given environment observations |
evaluate | Evaluate function approximator object given observation (or observation-action) input data |
gradient | (Not recommended) Evaluate gradient of function approximator object given observation and action input data |
accelerate | (Not recommended) Option to accelerate computation of gradient for approximator object based on neural network |
getLearnableParameters | Obtain learnable parameter values from agent, function approximator, or policy object |
setLearnableParameters | Set learnable parameter values of agent, function approximator, or policy object |
setModel | Set approximation model in function approximator object |
getModel | Get approximation model from function approximator object |
Examples
Create Continuous Deterministic Actor from Deep Neural Network
Create an observation specification object (or alternatively use getObservationInfo
to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that there is a single observation channel that carries a column vector containing four doubles.
obsInfo = rlNumericSpec([4 1]);
Create an action specification object (or alternatively use getActionInfo
to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that the action channel carries a column vector containing two doubles.
actInfo = rlNumericSpec([2 1]);
A continuous deterministic actor implements a parametrized deterministic policy for a continuous action space. This actor takes the current observation as input and returns as output an action that is a deterministic function of the observation.
To model the parametrized policy within the actor, use a neural network with one input layer (which receives the content of the environment observation channel, as specified by obsInfo
) and one output layer (which returns the action to the environment action channel, as specified by actInfo
).
Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects.
net = [ featureInputLayer(obsInfo.Dimension(1)) fullyConnectedLayer(32) reluLayer fullyConnectedLayer(actInfo.Dimension(1)) ];
Convert the network to a dlnetwork
object and display the number of learnable parameters.
net = dlnetwork(net); summary(net)
Initialized: true
Number of learnables: 226
Inputs: 1 'input' 4 features
Create the actor object with rlContinuousDeterministicActor
, using the network and the observation and action specification objects as input arguments. The network input layer is automatically associated with the environment observation channel according to the dimension specifications in obsInfo
.
actor = rlContinuousDeterministicActor( ... net, ... obsInfo, ... actInfo)
actor = rlContinuousDeterministicActor with properties:
ObservationInfo: [1x1 rl.util.rlNumericSpec]
ActionInfo: [1x1 rl.util.rlNumericSpec]
Normalization: "none"
UseDevice: "cpu"
Learnables: {4x1 cell}
State: {0x1 cell}
To check your actor, use getAction
to return the action from a random observation, using the current network weights.
act = getAction(actor, ... {rand(obsInfo.Dimension)}); act{1}
ans = 2x1 single column vector
-0.0684 -0.2538
You can now use the actor (along with a critic) to create an agent for the environment described by the given specification objects. Examples of agents that can work with continuous action and observation spaces, and use a continuous deterministic actor, are rlDDPGAgent and rlTD3Agent.
For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.
Create Continuous Deterministic Actor Specifying Network Input Layer Name
Create an observation specification object (or alternatively use getObservationInfo
to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that there is a single observation channel that carries a column vector containing four doubles.
obsInfo = rlNumericSpec([4 1]);
Create an action specification object (or alternatively use getActionInfo
to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that the action channel carries a column vector containing two doubles.
actInfo = rlNumericSpec([2 1]);
A continuous deterministic actor implements a parametrized deterministic policy for a continuous action space. This actor takes the current observation as input and returns an action as output.
To model the parametrized policy within the actor, use a neural network with one input layer (which receives the content of the environment observation channel, as specified by obsInfo
) and one output layer (which returns the action to the environment action channel, as specified by actInfo
).
Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects. Name the network input layer obsInLyr
so you can later explicitly associate it to the observation input channel.
net = [ featureInputLayer(obsInfo.Dimension(1),Name="obsInLyr") fullyConnectedLayer(16) reluLayer fullyConnectedLayer(actInfo.Dimension(1))];
Convert the network to a dlnetwork
object, and display the number of learnable parameters.
net = dlnetwork(net); summary(net)
Initialized: true
Number of learnables: 114
Inputs: 1 'obsInLyr' 4 features
Create the actor object with rlContinuousDeterministicActor
, using the network, the observation and action specification objects, and the name of the network input layer to be associated with the environment observation channel.
actor = rlContinuousDeterministicActor(net, ... obsInfo,actInfo, ... Observation="obsInLyr")
actor = rlContinuousDeterministicActor with properties:
ObservationInfo: [1x1 rl.util.rlNumericSpec]
ActionInfo: [1x1 rl.util.rlNumericSpec]
Normalization: "none"
UseDevice: "cpu"
Learnables: {4x1 cell}
State: {0x1 cell}
To check your actor, use getAction
to return the action from a random observation, using the current network weights.
act = getAction(actor,{rand(obsInfo.Dimension)}); act{1}
ans = 2x1 single column vector
0.4013
0.0578
You can now use the actor (along with a critic) to create an agent for the environment described by the given specification objects. Examples of agents that can work with continuous action and observation spaces, and use a continuous deterministic actor, are rlDDPGAgent and rlTD3Agent.
For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.
Create Continuous Deterministic Actor from Custom Basis Function
Create an observation specification object (or alternatively use getObservationInfo
to extract the specification object from an environment). For this example, define the observation space as consisting of two environment channels, the first containing a two-by-two continuous matrix and the second containing a scalar that can be only 0 or 1.
obsInfo = [rlNumericSpec([2 2]) rlFiniteSetSpec([0 1])];
Create a continuous action space specification object (or alternatively use getActionInfo
to extract the specification object from an environment). For this example, define the action space as a continuous three-dimensional space, so that the environment action channel carries a column vector containing three doubles.
actInfo = rlNumericSpec([3 1]);
A continuous deterministic actor implements a parametrized deterministic policy for a continuous action space. This actor takes a batch of observations as inputs and returns a corresponding batch of actions that are a deterministic function of the observations.
To model the parametrized policy within the actor, use a custom basis function with two input arguments (one for each observation channel). Here, the third dimension is the batch dimension. For each element of the batch dimension, the output of the basis function is a vector of four elements.
myBasisFcn = @(obsA,obsB) [ obsA(1,1,:)+obsB(1,1,:).^2; obsA(2,1,:)-obsB(1,1,:).^2; obsA(1,2,:).^2+obsB(1,1,:); obsA(2,2,:).^2-obsB(1,1,:) ];
The output of the actor is the vector W'*myBasisFcn(obsA,obsB)
, which is the action taken as a result of the given observation. The weight matrix W
contains the learnable parameters and must have as many rows as the length of the basis function output and as many columns as the dimension of the action space.
Define an initial parameter matrix.
Create the actor. The first argument is a two-element cell containing both the handle to the custom function and the initial weight matrix. The second and third arguments are, respectively, the observation and action specification objects.
actor = rlContinuousDeterministicActor({myBasisFcn,W0},obsInfo,actInfo)
actor = rlContinuousDeterministicActor with properties:
ObservationInfo: [2x1 rl.util.RLDataSpec]
ActionInfo: [1x1 rl.util.rlNumericSpec]
Normalization: ["none" "none"]
UseDevice: "cpu"
Learnables: {[3x4 dlarray]}
State: {}
To check your actor, use getAction
to return the action from a given observation, using the current parameter matrix.
a = getAction(actor,{rand(2,2),0})
a = 1x1 cell array {3x1 single}
ans = 3x1 single column vector
1.9733
1.1479
2.2037
Note that the actor does not enforce the set constraint for the discrete set elements.
a = getAction(actor,{rand(2,2),-1}); a{1}
ans = 3x1 single column vector
2.0251
1.4035
2.2437
Obtain actions for a random batch of 10 observations.
a = getAction(actor,{rand(2,2,10),rand(1,1,10)});
Get the seventh action in the batch.
ans = 3x1 single column vector
1.1097
0.8339
1.2711
You can now use the actor (along with a critic) to create an agent for the environment described by the given specification objects. Examples of agents that can work with a mixed observation space, a continuous action space, and use a continuous deterministic actor, are rlDDPGAgent and rlTD3Agent.
For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.
Create Deterministic Actor from Recurrent Neural Network
Create observation and action information. You can also obtain these specifications from an environment. For this example, define the observation space as a continuous four-dimensional space, so that a single observation channel carries a column vector containing four doubles, and the action space as a continuous two-dimensional space, so that the action channel carries a column vector containing two doubles.
obsInfo = rlNumericSpec([4 1]); actInfo = rlNumericSpec([2 1]);
A continuous deterministic actor implements a parametrized deterministic policy for a continuous action space. This actor takes the current observation as input and returns as output an action that is a deterministic function of the observation.
To model the parametrized policy within the actor, use a neural network with one input layer (which receives the content of the environment observation channel, as specified by obsInfo
) and one output layer (which returns the action to the environment action channel, as specified by actInfo
).
Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects. Since this network is recurrent, use a sequenceInputLayer
as the input layer and include at least one lstmLayer
.
net = [ sequenceInputLayer(obsInfo.Dimension(1)) fullyConnectedLayer(10) reluLayer lstmLayer(8,OutputMode="sequence") fullyConnectedLayer(20) fullyConnectedLayer(actInfo.Dimension(1)) tanhLayer ];
Convert the network to a dlnetwork
object and display the number of learnable parameters.
net = dlnetwork(net); summary(net)
Initialized: true
Number of learnables: 880
Inputs: 1 'sequenceinput' Sequence input with 4 dimensions
Create a deterministic actor representation for the network.
actor = rlContinuousDeterministicActor( ... net, ... obsInfo, ... actInfo);
To check your actor, use getAction
to return the action from a random observation, given the current network weights.
a = getAction(actor, ... {rand(obsInfo.Dimension)}); a{1}
ans = 2x1 single column vector
-0.0742 0.0158
You can use dot notation to extract and set the current state of the recurrent neural network in the actor.
ans=2×1 cell array {8x1 dlarray} {8x1 dlarray}
actor.State = { dlarray(-0.1rand(8,1)) dlarray(0.1rand(8,1)) };
To evaluate the actor using sequential observations, use the sequence length (time) dimension. For example, obtain actions for 5 independent sequences each one consisting of 9
sequential observations.
[action,state] = getAction(actor, ... {rand([obsInfo.Dimension 5 9])});
Display the action corresponding to the seventh element of the observation sequence in the fourth sequence.
action = action{1}; action(1,1,4,7)
Display the updated state of the recurrent neural network.
state=2×1 cell array {8x5 single} {8x5 single}
You can now use the actor (along with a critic) to create an agent for the environment described by the given specification objects. Examples of agents that can work with continuous action and observation spaces, and use a continuous deterministic actor, are rlDDPGAgent and rlTD3Agent.
For more information on input and output format for recurrent neural networks, see the Algorithms section of lstmLayer. For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.
Version History
Introduced in R2022a
See Also
Functions
Objects
- rlNumericSpec | rlFiniteSetSpec | rlContinuousGaussianActor | rlDiscreteCategoricalActor | rlHybridStochasticActor | rlDeterministicActorPolicy | rlAdditiveNoisePolicy | rlTD3Agent | rlDDPGAgent