rlStochasticActorRepresentation - (Not recommended) Stochastic actor representation for reinforcement learning
agents - MATLAB ([original](https://www.mathworks.com/help/reinforcement-learning/ref/rl.representation.rlstochasticactorrepresentation.html)) ([raw](?raw))
(Not recommended) Stochastic actor representation for reinforcement learning agents
Since R2020a
Description
This object implements a function approximator to be used as a stochastic actor within a reinforcement learning agent. A stochastic actor takes the observations as inputs and returns a random action, thereby implementing a stochastic policy with a specific probability distribution. After you create an rlStochasticActorRepresentation
object, use it to create a suitable agent, such as an rlACAgent or rlPGAgent agent. For more information on creating representations, see Create Policies and Value Functions.
Creation
Syntax
Description
Discrete Action Space Stochastic Actor
`discActor` = rlStochasticActorRepresentation([net](#mw%5F06cb83b4-bd09-4993-9a0a-f007008ed264),`observationInfo`,`discActionInfo`,'Observation',[obsName](#mw%5F1ebc2198-860a-48cf-8891-bd46601f0b52))
creates a stochastic actor with a discrete action space, using the deep neural networknet
as function approximator. Here, the output layer ofnet
must have as many elements as the number of possible discrete actions. This syntax sets the ObservationInfo and ActionInfo properties of discActor
to the inputsobservationInfo
and discActionInfo
, respectively. obsName
must contain the names of the input layers ofnet
.
`discActor` = rlStochasticActorRepresentation({[basisFcn](#mw%5Fd24b3283-0b13-4473-8749-69662db9ab4b),[W0](#mw%5Fbdc1dc7e-a40a-4b29-b5bc-987cfb527042)},`observationInfo`,`actionInfo`)
creates a discrete space stochastic actor using a custom basis function as underlying approximator. The first input argument is a two-elements cell in which the first element contains the handle basisFcn
to a custom basis function, and the second element contains the initial weight matrix W0
. This syntax sets the ObservationInfo and ActionInfo properties of discActor
to the inputsobservationInfo
and actionInfo
, respectively.
`discActor` = rlStochasticActorRepresentation(___,`options`)
creates the discrete action space, stochastic actor discActor
using the additional options set options
, which is an rlRepresentationOptions object. This syntax sets the Options property of discActor
to theoptions
input argument. You can use this syntax with any of the previous input-argument combinations.
Continuous Action Space Gaussian Actor
`contActor` = rlStochasticActorRepresentation([net](#mw%5F06cb83b4-bd09-4993-9a0a-f007008ed264),`observationInfo`,`contActionInfo`,'Observation',[obsName](#mw%5F1ebc2198-860a-48cf-8891-bd46601f0b52))
creates a Gaussian stochastic actor with a continuous action space using the deep neural network net
as function approximator. Here, the output layer ofnet
must have twice as many elements as the number of dimensions of the continuous action space. This syntax sets the ObservationInfo and ActionInfo properties of contActor
to the inputsobservationInfo
and contActionInfo
respectively. obsName
must contain the names of the input layers ofnet
.
Note
contActor
does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.
`contActor` = rlStochasticActorRepresentation(___,`options`)
creates the continuous action space, Gaussian actor contActor
using the additional options
option set, which is an rlRepresentationOptions object. This syntax sets the Options property of contActor
to theoptions
input argument. You can use this syntax with any of the previous input-argument combinations.
Input Arguments
net
— Deep neural network
array of Layer
objects | layerGraph
object | DAGNetwork
object | SeriesNetwork
object | dlNetwork
object
Deep neural network used as the underlying approximator within the actor, specified as one of the following:
- Array of
Layer
objects - layerGraph object
- DAGNetwork object
- SeriesNetwork object
- dlnetwork object
For a discrete action space stochastic actor, net
must have the observations as input and a single output layer having as many elements as the number of possible discrete actions. Each element represents the probability (which must be nonnegative) of executing the corresponding action.
For a continuous action space stochastic actor, net
must have the observations as input and a single output layer having twice as many elements as the number of dimensions of the continuous action space. The elements of the output vector represent all the mean values followed by all the standard deviations (which must be nonnegative) of the Gaussian distributions for the dimensions of the action space.
Note
The fact that standard deviations must be nonnegative while mean values must fall within the output range means that the network must have two separate paths. The first path must produce an estimation for the mean values, so any output nonlinearity must be scaled so that its output falls in the desired range. The second path must produce an estimation for the standard deviations, so you must use a softplus or ReLU layer to enforce nonnegativity.
The network input layers must be in the same order and with the same data type and dimensions as the signals defined in ObservationInfo. Also, the names of these input layers must match the observation names specified in obsName
. The network output layer must have the same data type and dimension as the signal defined in ActionInfo.
rlStochasticActorRepresentation
objects support recurrent deep neural networks.
For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.
obsName
— Observation names
string | character vector | cell array of character vectors
Observation names, specified as a cell array of strings or character vectors. The observation names must be the names of the input layers innet
.
Example: {'my_obs'}
basisFcn
— Custom basis function
function handle
Custom basis function, specified as a function handle to a user-defined MATLAB function. The user defined function can either be an anonymous function or a function on the MATLAB path. The output of the actor is the vector a = softmax(W'*B)
, where W
is a weight matrix andB
is the column vector returned by the custom basis function. Each element of a
represents the probability of taking the corresponding action. The learnable parameters of the actor are the elements ofW
.
When creating a stochastic actor representation, your basis function must have the following signature.
B = myBasisFunction(obs1,obs2,...,obsN)
Here obs1
to obsN
are observations in the same order and with the same data type and dimensions as the signals defined inobservationInfo
Example: @(obs1,obs2,obs3) [obs3(2)*obs1(1)^2; abs(obs2(5)+obs3(1))]
W0
— Initial value of the basis function weights
column vector
Initial value of the basis function weights, W
, specified as a matrix. It must have as many rows as the length of the basis function output, and as many columns as the number of possible actions.
Properties
Options
— Representation options
rlRepresentationOptions
object
Representation options, specified as an rlRepresentationOptions object. Available options include the optimizer used for training and the learning rate.
Note
TRPO agents use only the Options.UseDevice
representation options and ignore the other training and learning rate options.
ObservationInfo
— Observation specifications
rlFiniteSetSpec
object | rlNumericSpec
object | array
Observation specifications, specified as specified as an rlFiniteSetSpec or rlNumericSpec object or an array containing a mix of such objects. These objects define properties such as dimensions, data type, and names of the observation signals.
rlStochasticActorRepresentation sets the ObservationInfo property of contActor
ordiscActor
to the inputobservationInfo
.
You can extract observationInfo
from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.
ActionInfo
— Action specifications
rlFiniteSetSpec
object | rlNumericSpec
object
Action specifications, specified as an rlFiniteSetSpec or rlNumericSpec object. These objects define properties such as the dimensions, data type and name of the action signals.
For a discrete action space actor, rlStochasticActorRepresentation
sets ActionInfo
to the input discActionInfo
, which must be an rlFiniteSetSpec
object.
For a continuous action space actor,rlStochasticActorRepresentation
sets ActionInfo
to the input contActionInfo
, which must be anrlNumericSpec
object.
You can extract ActionInfo
from an existing environment or agent using getActionInfo. You can also construct the specification manually.
For custom basis function representations, the action signal must be a scalar, a column vector, or a discrete action.
Object Functions
rlACAgent | Actor-critic (AC) reinforcement learning agent |
---|---|
rlPGAgent | Policy gradient (PG) reinforcement learning agent |
rlPPOAgent | Proximal policy optimization (PPO) reinforcement learning agent |
rlSACAgent | Soft actor-critic (SAC) reinforcement learning agent |
getAction | Obtain action from agent, actor, or policy object given environment observations |
Examples
Create Discrete Stochastic Actor from Deep Neural Network
Create an observation specification object (or alternatively usegetObservationInfo
to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles.
obsInfo = rlNumericSpec([4 1]);
Create an action specification object (or alternatively usegetActionInfo
to extract the specification object from an environment). For this example, define the action space as consisting of three values, -10
, 0
, and 10
.
actInfo = rlFiniteSetSpec([-10 0 10]);
Create a deep neural network approximator for the actor. The input of the network (here called state
) must accept a four-element vector (the observation vector just defined by obsInfo
), and its output (here called actionProb
) must be a three-element vector. Each element of the output vector must be between 0 and 1 since it represents the probability of executing each of the three possible actions (as defined by actInfo
). Using softmax as the output layer enforces this requirement.
net = [ featureInputLayer(4,'Normalization','none',... 'Name','state') fullyConnectedLayer(3,'Name','fc') softmaxLayer('Name','actionProb') ];
Create the actor with rlStochasticActorRepresentation
, using the network, the observations and action specification objects, as well as the names of the network input layer.
actor = rlStochasticActorRepresentation(net,obsInfo,actInfo,... 'Observation','state')
actor = rlStochasticActorRepresentation with properties:
ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
ObservationInfo: [1x1 rl.util.rlNumericSpec]
Options: [1x1 rl.option.rlRepresentationOptions]
To validate your actor, use getAction
to return a random action from the observation vector [1 1 1 1]
, using the current network weights.
act = getAction(actor,{[1 1 1 1]}); act{1}
You can now use the actor to create a suitable agent, such as an rlACAgent, or rlPGAgent agent.
Create Continuous Stochastic Actor from Deep Neural Network
Create an observation specification object (or alternatively use getObservationInfo
to extract the specification object from an environment). For this example, define the observation space as a continuous six-dimensional space, so that a single observation is a column vector containing 6 doubles.
obsInfo = rlNumericSpec([6 1]);
Create an action specification object (or alternatively usegetActionInfo
to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that a single action is a column vector containing 2 doubles both between-10
and 10
.
actInfo = rlNumericSpec([2 1],'LowerLimit',-10,'UpperLimit',10);
Create a deep neural network approximator for the actor. The observation input (here called myobs
) must accept a six-dimensional vector (the observation vector just defined by obsInfo
). The output (here calledmyact
) must be a four-dimensional vector (twice the number of dimensions defined by actInfo
). The elements of the output vector represent, in sequence, all the means and all the standard deviations of every action.
The fact that standard deviations must be non-negative while mean values must fall within the output range means that the network must have two separate paths. The first path is for the mean values, and any output nonlinearity must be scaled so that it can produce values in the output range. The second path is for the standard deviations, and you can use a softplus or ReLU layer to enforce non-negativity.
% input path layers (6 by 1 input and a 2 by 1 output) inPath = [ imageInputLayer([6 1 1], ... 'Normalization','none',... 'Name','myobs') fullyConnectedLayer(2,'Name','infc') ];
% path layers for mean value % (2 by 1 input and 2 by 1 output) % using scalingLayer to scale the range meanPath = [ tanhLayer('Name','tanh'); % range: (-1,1) scalingLayer('Name','scale',... 'Scale',actInfo.UpperLimit) ]; % range: (-10,10)
% path layers for standard deviations % (2 by 1 input and output) % using softplus layer to make it non negative sdevPath = softplusLayer('Name', 'splus');
% conctatenate two inputs (along dimension #3) % to form a single (4 by 1) output layer outLayer = concatenationLayer(3,2,'Name','mean&sdev');
% add layers to network object net = layerGraph(inPath); net = addLayers(net,meanPath); net = addLayers(net,sdevPath); net = addLayers(net,outLayer);
% connect layers: the mean value path output MUST % be connected to the FIRST input of the concatenationLayer
% connect output of inPath to meanPath input
net = connectLayers(net,'infc','tanh/in');
% connect output of inPath to sdevPath input
net = connectLayers(net,'infc','splus/in');
% connect output of meanPath to gaussPars input #1
net = connectLayers(net,'scale','mean&sdev/in1');
% connect output of sdevPath to gaussPars input #2
net = connectLayers(net,'splus','mean&sdev/in2');
Set some training options for the actor.
actorOpts = rlRepresentationOptions('LearnRate',8e-3,'GradientThreshold',1);
Create the actor with rlStochasticActorRepresentation
, using the network, the observations and action specification objects, the names of the network input layer and the options object.
actor = rlStochasticActorRepresentation(net, obsInfo, actInfo, 'Observation','myobs',actorOpts)
actor = rlStochasticActorRepresentation with properties:
ActionInfo: [1x1 rl.util.rlNumericSpec]
ObservationInfo: [1x1 rl.util.rlNumericSpec]
Options: [1x1 rl.option.rlRepresentationOptions]
To check your actor, use getAction
to return a random action from the observation vector ones(6,1)
, using the current network weights.
act = getAction(actor,{ones(6,1)}); act{1}
ans = 2x1 single column vector
-0.0763 9.6860
You can now use the actor to create a suitable agent (such as an rlACAgent, rlPGAgent, or rlPPOAgent agent).
Create Stochastic Actor from Custom Basis Function
Create an observation specification object (or alternatively usegetObservationInfo
to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing 2 doubles.
obsInfo = rlNumericSpec([2 1]);
The stochastic actor based on a custom basis function does not support continuous action spaces. Therefore, create a discrete action space specification object (or alternatively use getActionInfo
to extract the specification object from an environment with a discrete action space). For this example, define the action space as a finite set consisting of 3 possible values (named7
, 5
, and 3
in this case).
actInfo = rlFiniteSetSpec([7 5 3]);
Create a custom basis function. Each element is a function of the observations defined by obsInfo
.
myBasisFcn = @(myobs) [myobs(2)^2; myobs(1); exp(myobs(2)); abs(myobs(1))]
myBasisFcn = functionhandle with value: @(myobs)[myobs(2)^2;myobs(1);exp(myobs(2));abs(myobs(1))]
The output of the actor is the action, among the ones defined inactInfo
, corresponding to the element ofsoftmax(W'*myBasisFcn(myobs))
which has the highest value.W
is a weight matrix, containing the learnable parameters, which must have as many rows as the length of the basis function output, and as many columns as the number of possible actions.
Define an initial parameter matrix.
Create the actor. The first argument is a two-element cell containing both the handle to the custom function and the initial parameter matrix. The second and third arguments are, respectively, the observation and action specification objects.
actor = rlStochasticActorRepresentation({myBasisFcn,W0},obsInfo,actInfo)
actor = rlStochasticActorRepresentation with properties:
ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
ObservationInfo: [1x1 rl.util.rlNumericSpec]
Options: [1x1 rl.option.rlRepresentationOptions]
To check your actor use the getAction
function to return one of the three possible actions, depending on a given random observation and on the current parameter matrix.
v = getAction(actor,{rand(2,1)})
You can now use the actor (along with an critic) to create a suitable discrete action space agent.
Create Stochastic Actor with Recurrent Neural Network
For this example, you create a stochastic actor with a discrete action space using a recurrent neural network. You can also use a recurrent neural network for a continuous stochastic actor using the same method.
Create an environment and obtain observation and action information.
env = rlPredefinedEnv('CartPole-Discrete'); obsInfo = getObservationInfo(env); actInfo = getActionInfo(env); numObs = obsInfo.Dimension(1); numDiscreteAct = numel(actInfo.Elements);
Create a recurrent deep neural network for the actor. To create a recurrent neural network, use a sequenceInputLayer
as the input layer and include at least one lstmLayer
.
actorNetwork = [ sequenceInputLayer(numObs,'Normalization','none','Name','state') fullyConnectedLayer(8,'Name','fc') reluLayer('Name','relu') lstmLayer(8,'OutputMode','sequence','Name','lstm') fullyConnectedLayer(numDiscreteAct,'Name','output') softmaxLayer('Name','actionProb')];
Create a stochastic actor representation for the network.
actorOptions = rlRepresentationOptions('LearnRate',1e-3,... 'GradientThreshold',1); actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,... 'Observation','state', actorOptions);
Version History
Introduced in R2020a
R2022a: rlStochasticActorRepresentation
is not recommended
rlStochasticActorRepresentation
is not recommended. Use either rlDiscreteCategoricalActor or rlContinuousGaussianActor instead.
The following table shows some typical uses ofrlStochasticActorRepresentation
to create _neural network_-based actors, and how to update your code with one of the new stochastic actor approximator objects instead. The first table entry builds an actor with a discrete action space, the second one builds an actor with a continuous action space.
Network-Based Stochastic Actor Representation: Not Recommended | Network-Based Stochastic Actor Approximator: Recommended |
---|---|
myActor = rlStochasticActorRepresentation(net,obsInfo,actInfo,'Observation',obsNames), with actInfo defining a discrete action space andnet having observations as inputs and a single output layer with as many elements as the number of possible discrete actions. | myActor = rlDiscreteCategoricalActor(net,obsInfo,actInfo,'ObservationInputNames',obsNames). Use this syntax to create a stochastic actor object with a discrete action space. This actor samples its action from a categorical (also known as Multinoulli) distribution. |
myActor = rlStochasticActorRepresentation(net,obsInfo,actInfo,'Observation',obsNames), with actInfo defining a continuous action space andnet having observations as inputs and a single output layer with twice as many elements as the number of dimensions of the continuous action space (representing, in sequence, all the means and all the standard deviations of every action dimension). | myActor = rlContinuousGaussianActor(net,obsInfo,actInfo,'ObservationInputNames',obsNames,'ActionMeanOutputNames',actMeanNames,'ActionStandardDeviationOutputNames',actStdNames). Use this syntax to create a stochastic actor object with a continuous action space. This actor samples its action from a Gaussian distribution, and you must provide the names of the network outputs representing the mean and standard deviations for the action. |
The following table shows a typical use ofrlStochasticActorRepresentation
to create a (discrete action space) actor which use a (linear in the learnable parameters) custom basis function, and how to update your code with rlDiscreteCategoricalActor
instead. In these function calls, the first input argument is a two-element cell array containing both the handle to the custom basis function and the initial weight vector or matrix.
Custom Basis Stochastic Actor Representation: Not Recommended | Custom Basis Function-Based Stochastic Actor Approximator: Recommended |
---|---|
rep = rlStochasticActorRepresentation({basisFcn,W0},obsInfo,actInfo), where the basis function has observations as inputs and actions as outputs,W0 is a matrix with as many columns as the number of possible actions, and actInfo defines a discrete action space. | rep = rlDiscreteCategoricalActor({basisFcn,W0},obsInfo,actInfo). Use this syntax to create a stochastic actor object with a discrete action space which returns an action sampled from a categorical (also known as Multinoulli) distribution. |