rlStochasticActorRepresentation - (Not recommended) Stochastic actor representation for reinforcement learning

  agents - MATLAB ([original](https://www.mathworks.com/help/reinforcement-learning/ref/rl.representation.rlstochasticactorrepresentation.html)) ([raw](?raw))

(Not recommended) Stochastic actor representation for reinforcement learning agents

Since R2020a

Description

This object implements a function approximator to be used as a stochastic actor within a reinforcement learning agent. A stochastic actor takes the observations as inputs and returns a random action, thereby implementing a stochastic policy with a specific probability distribution. After you create an rlStochasticActorRepresentation object, use it to create a suitable agent, such as an rlACAgent or rlPGAgent agent. For more information on creating representations, see Create Policies and Value Functions.

Creation

Syntax

Description

Discrete Action Space Stochastic Actor

`discActor` = rlStochasticActorRepresentation([net](#mw%5F06cb83b4-bd09-4993-9a0a-f007008ed264),`observationInfo`,`discActionInfo`,'Observation',[obsName](#mw%5F1ebc2198-860a-48cf-8891-bd46601f0b52)) creates a stochastic actor with a discrete action space, using the deep neural networknet as function approximator. Here, the output layer ofnet must have as many elements as the number of possible discrete actions. This syntax sets the ObservationInfo and ActionInfo properties of discActor to the inputsobservationInfo and discActionInfo, respectively. obsName must contain the names of the input layers ofnet.

example

`discActor` = rlStochasticActorRepresentation({[basisFcn](#mw%5Fd24b3283-0b13-4473-8749-69662db9ab4b),[W0](#mw%5Fbdc1dc7e-a40a-4b29-b5bc-987cfb527042)},`observationInfo`,`actionInfo`) creates a discrete space stochastic actor using a custom basis function as underlying approximator. The first input argument is a two-elements cell in which the first element contains the handle basisFcn to a custom basis function, and the second element contains the initial weight matrix W0. This syntax sets the ObservationInfo and ActionInfo properties of discActor to the inputsobservationInfo and actionInfo, respectively.

example

`discActor` = rlStochasticActorRepresentation(___,`options`) creates the discrete action space, stochastic actor discActor using the additional options set options, which is an rlRepresentationOptions object. This syntax sets the Options property of discActor to theoptions input argument. You can use this syntax with any of the previous input-argument combinations.

Continuous Action Space Gaussian Actor

`contActor` = rlStochasticActorRepresentation([net](#mw%5F06cb83b4-bd09-4993-9a0a-f007008ed264),`observationInfo`,`contActionInfo`,'Observation',[obsName](#mw%5F1ebc2198-860a-48cf-8891-bd46601f0b52)) creates a Gaussian stochastic actor with a continuous action space using the deep neural network net as function approximator. Here, the output layer ofnet must have twice as many elements as the number of dimensions of the continuous action space. This syntax sets the ObservationInfo and ActionInfo properties of contActor to the inputsobservationInfo and contActionInfo respectively. obsName must contain the names of the input layers ofnet.

Note

contActor does not enforce constraints set by the action specification, therefore, when using this actor, you must enforce action space constraints within the environment.

example

`contActor` = rlStochasticActorRepresentation(___,`options`) creates the continuous action space, Gaussian actor contActor using the additional options option set, which is an rlRepresentationOptions object. This syntax sets the Options property of contActor to theoptions input argument. You can use this syntax with any of the previous input-argument combinations.

Input Arguments

expand all

net — Deep neural network

array of Layer objects | layerGraph object | DAGNetwork object | SeriesNetwork object | dlNetwork object

Deep neural network used as the underlying approximator within the actor, specified as one of the following:

For a discrete action space stochastic actor, net must have the observations as input and a single output layer having as many elements as the number of possible discrete actions. Each element represents the probability (which must be nonnegative) of executing the corresponding action.

For a continuous action space stochastic actor, net must have the observations as input and a single output layer having twice as many elements as the number of dimensions of the continuous action space. The elements of the output vector represent all the mean values followed by all the standard deviations (which must be nonnegative) of the Gaussian distributions for the dimensions of the action space.

Note

The fact that standard deviations must be nonnegative while mean values must fall within the output range means that the network must have two separate paths. The first path must produce an estimation for the mean values, so any output nonlinearity must be scaled so that its output falls in the desired range. The second path must produce an estimation for the standard deviations, so you must use a softplus or ReLU layer to enforce nonnegativity.

The network input layers must be in the same order and with the same data type and dimensions as the signals defined in ObservationInfo. Also, the names of these input layers must match the observation names specified in obsName. The network output layer must have the same data type and dimension as the signal defined in ActionInfo.

rlStochasticActorRepresentation objects support recurrent deep neural networks.

For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.

obsName — Observation names

string | character vector | cell array of character vectors

Observation names, specified as a cell array of strings or character vectors. The observation names must be the names of the input layers innet.

Example: {'my_obs'}

basisFcn — Custom basis function

function handle

Custom basis function, specified as a function handle to a user-defined MATLAB function. The user defined function can either be an anonymous function or a function on the MATLAB path. The output of the actor is the vector a = softmax(W'*B), where W is a weight matrix andB is the column vector returned by the custom basis function. Each element of a represents the probability of taking the corresponding action. The learnable parameters of the actor are the elements ofW.

When creating a stochastic actor representation, your basis function must have the following signature.

B = myBasisFunction(obs1,obs2,...,obsN)

Here obs1 to obsN are observations in the same order and with the same data type and dimensions as the signals defined inobservationInfo

Example: @(obs1,obs2,obs3) [obs3(2)*obs1(1)^2; abs(obs2(5)+obs3(1))]

W0 — Initial value of the basis function weights

column vector

Initial value of the basis function weights, W, specified as a matrix. It must have as many rows as the length of the basis function output, and as many columns as the number of possible actions.

Properties

expand all

Options — Representation options

rlRepresentationOptions object

Representation options, specified as an rlRepresentationOptions object. Available options include the optimizer used for training and the learning rate.

Note

TRPO agents use only the Options.UseDevice representation options and ignore the other training and learning rate options.

ObservationInfo — Observation specifications

rlFiniteSetSpec object | rlNumericSpec object | array

Observation specifications, specified as specified as an rlFiniteSetSpec or rlNumericSpec object or an array containing a mix of such objects. These objects define properties such as dimensions, data type, and names of the observation signals.

rlStochasticActorRepresentation sets the ObservationInfo property of contActor ordiscActor to the inputobservationInfo.

You can extract observationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

ActionInfo — Action specifications

rlFiniteSetSpec object | rlNumericSpec object

Action specifications, specified as an rlFiniteSetSpec or rlNumericSpec object. These objects define properties such as the dimensions, data type and name of the action signals.

For a discrete action space actor, rlStochasticActorRepresentation sets ActionInfo to the input discActionInfo, which must be an rlFiniteSetSpec object.

For a continuous action space actor,rlStochasticActorRepresentation sets ActionInfo to the input contActionInfo, which must be anrlNumericSpec object.

You can extract ActionInfo from an existing environment or agent using getActionInfo. You can also construct the specification manually.

For custom basis function representations, the action signal must be a scalar, a column vector, or a discrete action.

Object Functions

rlACAgent Actor-critic (AC) reinforcement learning agent
rlPGAgent Policy gradient (PG) reinforcement learning agent
rlPPOAgent Proximal policy optimization (PPO) reinforcement learning agent
rlSACAgent Soft actor-critic (SAC) reinforcement learning agent
getAction Obtain action from agent, actor, or policy object given environment observations

Examples

collapse all

Create Discrete Stochastic Actor from Deep Neural Network

Create an observation specification object (or alternatively usegetObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles.

obsInfo = rlNumericSpec([4 1]);

Create an action specification object (or alternatively usegetActionInfo to extract the specification object from an environment). For this example, define the action space as consisting of three values, -10, 0, and 10.

actInfo = rlFiniteSetSpec([-10 0 10]);

Create a deep neural network approximator for the actor. The input of the network (here called state) must accept a four-element vector (the observation vector just defined by obsInfo), and its output (here called actionProb) must be a three-element vector. Each element of the output vector must be between 0 and 1 since it represents the probability of executing each of the three possible actions (as defined by actInfo). Using softmax as the output layer enforces this requirement.

net = [ featureInputLayer(4,'Normalization','none',... 'Name','state') fullyConnectedLayer(3,'Name','fc') softmaxLayer('Name','actionProb') ];

Create the actor with rlStochasticActorRepresentation, using the network, the observations and action specification objects, as well as the names of the network input layer.

actor = rlStochasticActorRepresentation(net,obsInfo,actInfo,... 'Observation','state')

actor = rlStochasticActorRepresentation with properties:

     ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
ObservationInfo: [1x1 rl.util.rlNumericSpec]
        Options: [1x1 rl.option.rlRepresentationOptions]

To validate your actor, use getAction to return a random action from the observation vector [1 1 1 1], using the current network weights.

act = getAction(actor,{[1 1 1 1]}); act{1}

You can now use the actor to create a suitable agent, such as an rlACAgent, or rlPGAgent agent.

Create Continuous Stochastic Actor from Deep Neural Network

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous six-dimensional space, so that a single observation is a column vector containing 6 doubles.

obsInfo = rlNumericSpec([6 1]);

Create an action specification object (or alternatively usegetActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous two-dimensional space, so that a single action is a column vector containing 2 doubles both between-10 and 10.

actInfo = rlNumericSpec([2 1],'LowerLimit',-10,'UpperLimit',10);

Create a deep neural network approximator for the actor. The observation input (here called myobs) must accept a six-dimensional vector (the observation vector just defined by obsInfo). The output (here calledmyact) must be a four-dimensional vector (twice the number of dimensions defined by actInfo). The elements of the output vector represent, in sequence, all the means and all the standard deviations of every action.

The fact that standard deviations must be non-negative while mean values must fall within the output range means that the network must have two separate paths. The first path is for the mean values, and any output nonlinearity must be scaled so that it can produce values in the output range. The second path is for the standard deviations, and you can use a softplus or ReLU layer to enforce non-negativity.

% input path layers (6 by 1 input and a 2 by 1 output) inPath = [ imageInputLayer([6 1 1], ... 'Normalization','none',... 'Name','myobs') fullyConnectedLayer(2,'Name','infc') ];

% path layers for mean value % (2 by 1 input and 2 by 1 output) % using scalingLayer to scale the range meanPath = [ tanhLayer('Name','tanh'); % range: (-1,1) scalingLayer('Name','scale',... 'Scale',actInfo.UpperLimit) ]; % range: (-10,10)

% path layers for standard deviations % (2 by 1 input and output) % using softplus layer to make it non negative sdevPath = softplusLayer('Name', 'splus');

% conctatenate two inputs (along dimension #3) % to form a single (4 by 1) output layer outLayer = concatenationLayer(3,2,'Name','mean&sdev');

% add layers to network object net = layerGraph(inPath); net = addLayers(net,meanPath); net = addLayers(net,sdevPath); net = addLayers(net,outLayer);

% connect layers: the mean value path output MUST % be connected to the FIRST input of the concatenationLayer

% connect output of inPath to meanPath input net = connectLayers(net,'infc','tanh/in'); % connect output of inPath to sdevPath input net = connectLayers(net,'infc','splus/in');
% connect output of meanPath to gaussPars input #1 net = connectLayers(net,'scale','mean&sdev/in1'); % connect output of sdevPath to gaussPars input #2 net = connectLayers(net,'splus','mean&sdev/in2');

Set some training options for the actor.

actorOpts = rlRepresentationOptions('LearnRate',8e-3,'GradientThreshold',1);

Create the actor with rlStochasticActorRepresentation, using the network, the observations and action specification objects, the names of the network input layer and the options object.

actor = rlStochasticActorRepresentation(net, obsInfo, actInfo, 'Observation','myobs',actorOpts)

actor = rlStochasticActorRepresentation with properties:

     ActionInfo: [1x1 rl.util.rlNumericSpec]
ObservationInfo: [1x1 rl.util.rlNumericSpec]
        Options: [1x1 rl.option.rlRepresentationOptions]

To check your actor, use getAction to return a random action from the observation vector ones(6,1), using the current network weights.

act = getAction(actor,{ones(6,1)}); act{1}

ans = 2x1 single column vector

-0.0763 9.6860

You can now use the actor to create a suitable agent (such as an rlACAgent, rlPGAgent, or rlPPOAgent agent).

Create Stochastic Actor from Custom Basis Function

Create an observation specification object (or alternatively usegetObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing 2 doubles.

obsInfo = rlNumericSpec([2 1]);

The stochastic actor based on a custom basis function does not support continuous action spaces. Therefore, create a discrete action space specification object (or alternatively use getActionInfo to extract the specification object from an environment with a discrete action space). For this example, define the action space as a finite set consisting of 3 possible values (named7, 5, and 3 in this case).

actInfo = rlFiniteSetSpec([7 5 3]);

Create a custom basis function. Each element is a function of the observations defined by obsInfo.

myBasisFcn = @(myobs) [myobs(2)^2; myobs(1); exp(myobs(2)); abs(myobs(1))]

myBasisFcn = functionhandle with value: @(myobs)[myobs(2)^2;myobs(1);exp(myobs(2));abs(myobs(1))]

The output of the actor is the action, among the ones defined inactInfo, corresponding to the element ofsoftmax(W'*myBasisFcn(myobs)) which has the highest value.W is a weight matrix, containing the learnable parameters, which must have as many rows as the length of the basis function output, and as many columns as the number of possible actions.

Define an initial parameter matrix.

Create the actor. The first argument is a two-element cell containing both the handle to the custom function and the initial parameter matrix. The second and third arguments are, respectively, the observation and action specification objects.

actor = rlStochasticActorRepresentation({myBasisFcn,W0},obsInfo,actInfo)

actor = rlStochasticActorRepresentation with properties:

     ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
ObservationInfo: [1x1 rl.util.rlNumericSpec]
        Options: [1x1 rl.option.rlRepresentationOptions]

To check your actor use the getAction function to return one of the three possible actions, depending on a given random observation and on the current parameter matrix.

v = getAction(actor,{rand(2,1)})

You can now use the actor (along with an critic) to create a suitable discrete action space agent.

Create Stochastic Actor with Recurrent Neural Network

For this example, you create a stochastic actor with a discrete action space using a recurrent neural network. You can also use a recurrent neural network for a continuous stochastic actor using the same method.

Create an environment and obtain observation and action information.

env = rlPredefinedEnv('CartPole-Discrete'); obsInfo = getObservationInfo(env); actInfo = getActionInfo(env); numObs = obsInfo.Dimension(1); numDiscreteAct = numel(actInfo.Elements);

Create a recurrent deep neural network for the actor. To create a recurrent neural network, use a sequenceInputLayer as the input layer and include at least one lstmLayer.

actorNetwork = [ sequenceInputLayer(numObs,'Normalization','none','Name','state') fullyConnectedLayer(8,'Name','fc') reluLayer('Name','relu') lstmLayer(8,'OutputMode','sequence','Name','lstm') fullyConnectedLayer(numDiscreteAct,'Name','output') softmaxLayer('Name','actionProb')];

Create a stochastic actor representation for the network.

actorOptions = rlRepresentationOptions('LearnRate',1e-3,... 'GradientThreshold',1); actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,... 'Observation','state', actorOptions);

Version History

Introduced in R2020a

expand all

rlStochasticActorRepresentation is not recommended. Use either rlDiscreteCategoricalActor or rlContinuousGaussianActor instead.

The following table shows some typical uses ofrlStochasticActorRepresentation to create _neural network_-based actors, and how to update your code with one of the new stochastic actor approximator objects instead. The first table entry builds an actor with a discrete action space, the second one builds an actor with a continuous action space.

Network-Based Stochastic Actor Representation: Not Recommended Network-Based Stochastic Actor Approximator: Recommended
myActor = rlStochasticActorRepresentation(net,obsInfo,actInfo,'Observation',obsNames), with actInfo defining a discrete action space andnet having observations as inputs and a single output layer with as many elements as the number of possible discrete actions. myActor = rlDiscreteCategoricalActor(net,obsInfo,actInfo,'ObservationInputNames',obsNames). Use this syntax to create a stochastic actor object with a discrete action space. This actor samples its action from a categorical (also known as Multinoulli) distribution.
myActor = rlStochasticActorRepresentation(net,obsInfo,actInfo,'Observation',obsNames), with actInfo defining a continuous action space andnet having observations as inputs and a single output layer with twice as many elements as the number of dimensions of the continuous action space (representing, in sequence, all the means and all the standard deviations of every action dimension). myActor = rlContinuousGaussianActor(net,obsInfo,actInfo,'ObservationInputNames',obsNames,'ActionMeanOutputNames',actMeanNames,'ActionStandardDeviationOutputNames',actStdNames). Use this syntax to create a stochastic actor object with a continuous action space. This actor samples its action from a Gaussian distribution, and you must provide the names of the network outputs representing the mean and standard deviations for the action.

The following table shows a typical use ofrlStochasticActorRepresentation to create a (discrete action space) actor which use a (linear in the learnable parameters) custom basis function, and how to update your code with rlDiscreteCategoricalActor instead. In these function calls, the first input argument is a two-element cell array containing both the handle to the custom basis function and the initial weight vector or matrix.

Custom Basis Stochastic Actor Representation: Not Recommended Custom Basis Function-Based Stochastic Actor Approximator: Recommended
rep = rlStochasticActorRepresentation({basisFcn,W0},obsInfo,actInfo), where the basis function has observations as inputs and actions as outputs,W0 is a matrix with as many columns as the number of possible actions, and actInfo defines a discrete action space. rep = rlDiscreteCategoricalActor({basisFcn,W0},obsInfo,actInfo). Use this syntax to create a stochastic actor object with a discrete action space which returns an action sampled from a categorical (also known as Multinoulli) distribution.