syncParameters - Modify the learnable parameters of one approximator towards the learnable parameters

  of another approximator - MATLAB ([original](https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlcontinuousdeterministicactor.syncparameters.html)) ([raw](?raw))

Main Content

Modify the learnable parameters of one approximator towards the learnable parameters of another approximator

Since R2022a

Syntax

Description

[zFcnAppx](#mw%5F63cfeabf-16b3-46da-9c19-23e3f782d3da) = syncParameters([xFcnAppx](#mw%5F06f5c0a5-502b-4aa3-a5b2-7e82b0ce410c),[yFcnAppx](#mw%5Fbd5a9936-4114-4f06-b006-c0d92c4cd677),[smoothFactor](#mw%5Fb991b1fb-5510-41ae-bcf1-99d4e7509716)) returns an updated function approximator object of the same type and configuration ofxFcnAppx, but with its learnable parameters updated towardsyFcnAppx, according to the smooth factorsmoothFactor.

example

Examples

collapse all

For this example, create two value function critics and sync their parameters.

First, create an finite set observation specification for a scalar that can have four different values.

obsInfo = rlFiniteSetSpec(1:4);

Create a table object. Table values are initialized to zero by default.

table = rlTable(obsInfo);

Create a base critic.

Vx = rlValueFunction(table,obsInfo);

Set the table values to different values.

table.Table = [1 -1 -10 100]';

Use the updated table to create a new critic.

Vy = rlValueFunction(table,obsInfo);

Sync the parameter values of the base critic Vx, moving them by one fifth of the way towards the parameter values of the new critic Vy.

Vz = syncParameters(Vx,Vy,0.2);

Display the learnable parameters of the new critic Vz.

ans = 4×1 dlarray

0.2000

-0.2000 -2.0000 20.0000

Input Arguments

collapse all

New actor or critic object, specified as a function approximator object with a parameter cell array having the same dimensions as the one ofxFcnAppx.

Smooth factor, specified as a positive scalar smaller than one. This factor regulates the extent to which the parameters of xFcnAppx are updated towards the parameters of yFcnAppx. This operation is akin to a single step of a first order low-pass filter update on thexFcnAppx learnable parameters.

Specifically, if Pz is the parameter vector of zFcnAppx, then:

Pz =sPy + (1-s)Px

where Py and_Px_ are the parameter vectors ofyFcnAppx and xFcnAppx, respectively.

For example, if you use a smooth factor of 1, the parameters ofzFcnAppx are equal to the parameters ofyFcnAppx. If you use a smooth factor of 0.5, parameters ofzFcnAppx are equal to the average between the parameters ofyFcnAppx and xFcnAppx.

Output Arguments

collapse all

Updated target actor or critic object, returned as a function approximator object of the same type as xFcnAppx. The learnable parameter values ofzFcnAppx are set as a convex combination between the ones inxFcnAppx and the ones in yFcnAppx. For example, as specified in the description of smoothFactor, using a smooth factor of 1 results in zFcnAppx parameters equal toyFcnAppx parameters, while using a smooth factor of 0.5 results in zFcnAppx parameters equal to the average between parameters inxFcnAppx and yFcnAppx.

Version History

Introduced in R2022a

See Also

Functions

Objects

Topics