evaluate - Evaluate function approximator object given observation (or observation-action)

  input data - MATLAB ([original](https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlcontinuousdeterministicactor.evaluate.html)) ([raw](?raw))

Evaluate function approximator object given observation (or observation-action) input data

Since R2022a

Syntax

Description

[outData](#mw%5Ff6befbe4-f166-4533-8348-c3047b0f3838) = evaluate([fcnAppx](#mw%5F95c6210f-f2fc-4acf-8d74-680143acb04c),[inData](#mw%5Fa410171a-1257-45c8-bf34-227a5a6a920c)) evaluates the function approximator object (that is, the actor or critic)fcnAppx given the input value inData. It returns the output value outData.

[[outData](#mw%5Ff6befbe4-f166-4533-8348-c3047b0f3838),[nextState](#mw%5F49b078f5-1929-4c58-ad55-513785c5d3e2)] = evaluate([fcnAppx](#mw%5F95c6210f-f2fc-4acf-8d74-680143acb04c),[inData](#mw%5Fa410171a-1257-45c8-bf34-227a5a6a920c)) also returns the updated state of fcnAppx when it contains a recurrent neural network.

example

___ = evaluate(___,UseForward=[useForward](#mw%5Fe0a86638-768a-47a3-aaf3-c88e812c360e%5Fsep%5Fmw%5F7f8a74e4-f9cb-4418-bfe7-55c39bddd077)) allows you to explicitly call a forward pass when computing gradients.

Examples

collapse all

Evaluate Function Approximator Object

This example shows you how to evaluate a function approximator object (that is, an actor or a critic). For this example, the function approximator object is a discrete categorical actor and you evaluate it given some observation data, obtaining in return the action probability distribution and the updated network state.

Load the same environment used in Train PG Agent to Balance Discrete Cart-Pole System, and obtain the observation and action specifications.

env = rlPredefinedEnv("CartPole-Discrete"); obsInfo = getObservationInfo(env)

obsInfo = rlNumericSpec with properties:

 LowerLimit: -Inf
 UpperLimit: Inf
       Name: "CartPole States"
Description: "x, dx, theta, dtheta"
  Dimension: [4 1]
   DataType: "double"

actInfo = getActionInfo(env)

actInfo = rlFiniteSetSpec with properties:

   Elements: [-10 10]
       Name: "CartPole Action"
Description: [0x0 string]
  Dimension: [1 1]
   DataType: "double"

To approximate the policy within the actor, use a recurrent deep neural network. Define the network as an array of layer objects. Get the dimensions of the observation space and the number of possible actions directly from the environment specification objects.

net = [ sequenceInputLayer(prod(obsInfo.Dimension)) fullyConnectedLayer(8) reluLayer lstmLayer(8,OutputMode="sequence") fullyConnectedLayer(numel(actInfo.Elements)) ];

Convert the network to a dlnetwork object and display the number of weights.

net = dlnetwork(net); summary(net)

Initialized: true

Number of learnables: 602

Inputs: 1 'sequenceinput' Sequence input with 4 dimensions

Create a stochastic actor representation for the network.

actor = rlDiscreteCategoricalActor(net,obsInfo,actInfo);

Use evaluate to return the probability of each of the two possible actions. Note that the type of the returned numbers is single, not double.

[prob,state] = evaluate(actor,{rand(obsInfo.Dimension)}); prob{1}

ans = 2x1 single column vector

0.4847
0.5153

Since a recurrent neural network is used for the actor, the second output argument, representing the updated state of the neural network, is not empty. In this case, it contains the updated (cell and hidden) states for the eight units of the lstm layer used in the network.

ans = 8x1 single column vector

-0.0833 0.0619 -0.0066 -0.0651 0.0714 -0.0957 0.0614 -0.0326

ans = 8x1 single column vector

-0.1367 0.1142 -0.0158 -0.1820 0.1305 -0.1779 0.0947 -0.0833

You can use dot notation to extract and set the current state of the recurrent neural network in the actor.

ans=2×1 cell array {8x1 dlarray} {8x1 dlarray}

actor.State = { dlarray(-0.1rand(8,1)) dlarray(0.1rand(8,1)) };

You can obtain action probabilities and updated states for a batch of observations. For example, use a batch of five independent observations.

obsBatch = reshape(1:20,4,1,5,1); [prob,state] = evaluate(actor,{obsBatch})

prob = 1x1 cell array {2x5 single}

state=2×1 cell array {8x5 single} {8x5 single}

The output arguments contain action probabilities and updated states for each observation in the batch.

Note that the actor treats observation data along the batch length dimension independently, not sequentially.

ans = 2x5 single matrix

0.5303    0.5911    0.6083    0.6158    0.6190
0.4697    0.4089    0.3917    0.3842    0.3810

prob = evaluate(actor,{obsBatch(:,:,[5 4 3 1 2])}); prob{1}

ans = 2x5 single matrix

0.6190    0.6158    0.6083    0.5303    0.5911
0.3810    0.3842    0.3917    0.4697    0.4089

To evaluate the actor using sequential observations, use the sequence length (time) dimension. For example, obtain action probabilities for five independent sequences, each one made of nine sequential observations.

[prob,state] = evaluate(actor, ... {rand([obsInfo.Dimension 5 9])})

prob = 1x1 cell array {2x5x9 single}

state=2×1 cell array {8x5 single} {8x5 single}

The first output argument contains a vector of two probabilities (first dimension) for each element of the observation batch (second dimension) and for each time element of the sequence length (third dimension).

The second output argument contains two vectors of final states for each observation batch (that is, the network maintains a separate state history for each observation batch).

Display the probability of the second action, after the seventh sequential observation in the fourth independent batch.

For more information on input and output format for recurrent neural networks, see the Algorithms section of lstmLayer.

Input Arguments

collapse all

inData — Input data for function approximator

cell array

Input data for the function approximator, specified as a cell array with as many elements as the number of input channels of fcnAppx. In the following section, the number of observation channels is indicated by_NO_.

Each element of inData must be a matrix of dimension_MC_-by-_LB_-by-LS, where:

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

Example: {rand(8,3,64,1),rand(4,1,64,1),rand(2,1,64,1)}

useForward — Option to use parallel training

false (default) | true

Option to use forward pass, specified as a logical value. When you specifyUseForward=true the function calculates its outputs usingforward instead of predict. This allows layers such as batch normalization and dropout to appropriately change their behavior for training.

Example: true

Output Arguments

collapse all

outData — Output data from evaluation of function approximator object

cell array

Output data from the evaluation of the function approximator object, returned as a cell array. The size and contents of outData depend on the type of object you use for fcnAppx, and are shown in the following list. Here, NO is the number of observation channels.

Each element of outData is a matrix of dimensions_D_-by-_LB_-by-LS, where:

Note

If fcnAppx is an rlContinuousDeterministicRewardFunction object, thenevaluate behaves identically to predict except that it returns results inside a single-cell array. IffcnAppx is an rlContinuousDeterministicTransitionFunction object, thenevaluate behaves identically to predict. IffcnAppx is an rlContinuousGaussianTransitionFunction object, thenevaluate returns the mean value and standard deviation the observation probability distribution, while predict returns an observation sampled from this distribution. Similarly, for an rlContinuousGaussianRewardFunction object, evaluate returns the mean value and standard deviation the reward probability distribution, while predict returns a reward sampled from this distribution. Finally, iffcnAppx is an rlIsDoneFunction object, then evaluate returns the probabilities of the termination status being false or true, respectively, whilepredict returns a predicted termination status sampled with these probabilities.

nextState — Updated state of function approximator object

cell array

Next state of the function approximator object, returned as a cell array. IffcnAppx does not use a recurrent neural network (which is the case for environment function approximators), then nextState is an empty cell array.

You can set the state of the approximator to state using dot notation. For example:

Version History

Introduced in R2022a

See Also

Functions

Objects

Topics