syncParameters - Modify the learnable parameters of one approximator towards the learnable parameters
of another approximator - MATLAB ([original](https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlcontinuousdeterministicactor.syncparameters.html)) ([raw](?raw))
Main Content
Modify the learnable parameters of one approximator towards the learnable parameters of another approximator
Since R2022a
Syntax
Description
[zFcnAppx](#mw%5F63cfeabf-16b3-46da-9c19-23e3f782d3da) = syncParameters([xFcnAppx](#mw%5F06f5c0a5-502b-4aa3-a5b2-7e82b0ce410c),[yFcnAppx](#mw%5Fbd5a9936-4114-4f06-b006-c0d92c4cd677),[smoothFactor](#mw%5Fb991b1fb-5510-41ae-bcf1-99d4e7509716))
returns an updated function approximator object of the same type and configuration ofxFcnAppx
, but with its learnable parameters updated towardsyFcnAppx
, according to the smooth factorsmoothFactor
.
Examples
For this example, create two value function critics and sync their parameters.
First, create an finite set observation specification for a scalar that can have four different values.
obsInfo = rlFiniteSetSpec(1:4);
Create a table object. Table values are initialized to zero by default.
table = rlTable(obsInfo);
Create a base critic.
Vx = rlValueFunction(table,obsInfo);
Set the table values to different values.
table.Table = [1 -1 -10 100]';
Use the updated table to create a new critic.
Vy = rlValueFunction(table,obsInfo);
Sync the parameter values of the base critic Vx
, moving them by one fifth of the way towards the parameter values of the new critic Vy
.
Vz = syncParameters(Vx,Vy,0.2);
Display the learnable parameters of the new critic Vz
.
ans = 4×1 dlarray
0.2000
-0.2000 -2.0000 20.0000
Input Arguments
New actor or critic object, specified as a function approximator object with a parameter cell array having the same dimensions as the one ofxFcnAppx.
Smooth factor, specified as a positive scalar smaller than one. This factor regulates the extent to which the parameters of xFcnAppx are updated towards the parameters of yFcnAppx. This operation is akin to a single step of a first order low-pass filter update on thexFcnAppx
learnable parameters.
Specifically, if Pz is the parameter vector of zFcnAppx, then:
Pz =sPy + (1-s)Px
where Py and_Px_ are the parameter vectors ofyFcnAppx
and xFcnAppx
, respectively.
For example, if you use a smooth factor of 1, the parameters ofzFcnAppx
are equal to the parameters ofyFcnAppx
. If you use a smooth factor of 0.5, parameters ofzFcnAppx
are equal to the average between the parameters ofyFcnAppx
and xFcnAppx
.
Output Arguments
Updated target actor or critic object, returned as a function approximator object of the same type as xFcnAppx. The learnable parameter values ofzFcnAppx
are set as a convex combination between the ones inxFcnAppx
and the ones in yFcnAppx. For example, as specified in the description of smoothFactor, using a smooth factor of 1 results in zFcnAppx
parameters equal toyFcnAppx
parameters, while using a smooth factor of 0.5 results in zFcnAppx
parameters equal to the average between parameters inxFcnAppx
and yFcnAppx
.
Version History
Introduced in R2022a
See Also
Functions
- getLearnableParameters | setLearnableParameters | getActor | getCritic | setActor | setCritic | getModel | setModel | evaluate | update
Objects
- dlnetwork | rlValueFunction | rlQValueFunction | rlVectorQValueFunction | rlContinuousDeterministicActor | rlDiscreteCategoricalActor | rlContinuousGaussianActor | rlContinuousDeterministicTransitionFunction | rlContinuousGaussianTransitionFunction | rlContinuousDeterministicRewardFunction | rlContinuousGaussianRewardFunction | rlIsDoneFunction