gru - Gated recurrent unit - MATLAB (original) (raw)
Gated recurrent unit
Since R2020a
Syntax
Description
The gated recurrent unit (GRU) operation allows a network to learn dependencies between time steps in time series and sequence data.
Note
This function applies the deep learning GRU operation to dlarray data. If you want to apply a GRU operation within a dlnetwork object, use gruLayer.
[Y](#mw%5F49a05dd1-2e01-44c3-85e3-b5ebe21ef5b4) = gru([X](#mw%5F9e409aba-cda7-4731-8b85-96a312f6d10f),[H0](#mw%5Fabc8c83a-4fb3-4ef5-bb39-b7d79499d12b),[weights](#mw%5F3efb6c9e-295f-43fa-8944-299acaf0957d),[recurrentWeights](#mw%5F747fda94-0f15-4479-ae07-3ebf63468290),[bias](#mw%5F32f673bb-eace-46f2-897a-e20452ccd82f))
applies a gated recurrent unit (GRU) calculation to input X
using the initial hidden state H0
, and parameters weights
,recurrentWeights
, and bias
. The inputX
must be a formatted dlarray
. The outputY
is a formatted dlarray
with the same dimension format as X
, except for any "S"
dimensions.
The gru
function updates the hidden state using the hyperbolic tangent function (tanh) as the state activation function. The gru
function uses the sigmoid function given by σ(x)=(1+e−x)−1 as the gate activation function.
[[Y](#mw%5F49a05dd1-2e01-44c3-85e3-b5ebe21ef5b4),[hiddenState](#function%5Fgru%5Fsep%5Fmw%5F561b60c1-4959-4191-b31b-9c158841ecd7)] = gru([X](#mw%5F9e409aba-cda7-4731-8b85-96a312f6d10f),[H0](#mw%5Fabc8c83a-4fb3-4ef5-bb39-b7d79499d12b),[weights](#mw%5F3efb6c9e-295f-43fa-8944-299acaf0957d),[recurrentWeights](#mw%5F747fda94-0f15-4479-ae07-3ebf63468290),[bias](#mw%5F32f673bb-eace-46f2-897a-e20452ccd82f))
also returns the hidden state after the GRU operation.
___ = gru([X](#mw%5F9e409aba-cda7-4731-8b85-96a312f6d10f),[H0](#mw%5Fabc8c83a-4fb3-4ef5-bb39-b7d79499d12b),[weights](#mw%5F3efb6c9e-295f-43fa-8944-299acaf0957d),[recurrentWeights](#mw%5F747fda94-0f15-4479-ae07-3ebf63468290),[bias](#mw%5F32f673bb-eace-46f2-897a-e20452ccd82f),DataFormat=FMT)
also specifies the dimension format FMT
when X
is not a formatted dlarray
. The output Y
is an unformatteddlarray
with the same dimension order as X
, except for any "S"
dimensions.
___ = gru([X](#mw%5F9e409aba-cda7-4731-8b85-96a312f6d10f),[H0](#mw%5Fabc8c83a-4fb3-4ef5-bb39-b7d79499d12b),[weights](#mw%5F3efb6c9e-295f-43fa-8944-299acaf0957d),[recurrentWeights](#mw%5F747fda94-0f15-4479-ae07-3ebf63468290),[bias](#mw%5F32f673bb-eace-46f2-897a-e20452ccd82f),Name=Value)
specifies additional options using one or more name-value arguments.
Examples
Apply GRU Operation to Sequence Data
Perform a GRU operation using 100 hidden units.
Create the input sequence data as 32 observations with ten channels and a sequence length of 64.
numFeatures = 10; numObservations = 32; sequenceLength = 64;
X = randn(numFeatures,numObservations,sequenceLength); X = dlarray(X,"CBT");
Create the initial hidden state with 100 hidden units. Use the same initial hidden state for all observations.
numHiddenUnits = 100; H0 = zeros(numHiddenUnits,1);
Create the learnable parameters for the GRU operation.
weights = dlarray(randn(3numHiddenUnits,numFeatures)); recurrentWeights = dlarray(randn(3numHiddenUnits,numHiddenUnits)); bias = dlarray(randn(3*numHiddenUnits,1));
Perform the GRU calculation.
[Y,hiddenState] = gru(X,H0,weights,recurrentWeights,bias);
View the size and dimension format of the output.
View the size of the hidden state.
You can use the hidden state to keep track of the state of the GRU operation and input further sequential data.
Input Arguments
X
— Input data
dlarray
| numeric array
Input data, specified as a formatted dlarray
, an unformatteddlarray
, or a numeric array. When X
is not a formatted dlarray
, you must specify the dimension label format using the DataFormat
name-value argument. If X
is a numeric array, at least one of H0, weights,recurrentWeights, or bias must be adlarray
.
X
must contain a sequence dimension labeled "T"
. IfX
has any spatial dimensions labeled "S"
, they are flattened into the "C"
channel dimension. If X
does not have a channel dimension, then one is added. If X
has any unspecified dimensions labeled "U"
, they must be singleton.
Data Types: single
| double
H0
— Initial hidden state vector
dlarray
| numeric array
Initial hidden state vector, specified as a formatted dlarray
, an unformatted dlarray
, or a numeric array.
If H0
is a formatted dlarray
, it must contain a channel dimension labeled "C"
and optionally a batch dimension labeled "B"
with the same size as the "B"
dimension of X
. If H0
does not have a"B"
dimension, the function uses the same hidden state vector for each observation in X
.
If H0
is a formatted dlarray
, then the size of the "C"
dimension determines the number of hidden units. Otherwise, the size of the first dimension determines the number of hidden units.
Data Types: single
| double
weights
— Weights
dlarray
| numeric array
Weights, specified as a formatted dlarray
, an unformatteddlarray
, or a numeric array.
Specify weights
as a matrix of size3*NumHiddenUnits
-by-InputSize
, whereNumHiddenUnits
is the size of the "C"
dimension of H0, and InputSize
is the size of the"C"
dimension of X multiplied by the size of each "S"
dimension of X
, where present.
If weights
is a formatted dlarray
, it must contain a "C"
dimension of size 3*NumHiddenUnits
and a "U"
dimension of size InputSize
.
Data Types: single
| double
recurrentWeights
— Recurrent weights
dlarray
| numeric array
Recurrent weights, specified as a formatted dlarray
, an unformatted dlarray
, or a numeric array.
Specify recurrentWeights
as a matrix of size3*NumHiddenUnits
-by-NumHiddenUnits
, whereNumHiddenUnits
is the size of the "C"
dimension of H0.
If recurrentWeights
is a formatted dlarray
, it must contain a "C"
dimension of size3*NumHiddenUnits
and a "U"
dimension of sizeNumHiddenUnits
.
Data Types: single
| double
bias
— Bias
dlarray
vector | numeric vector
Bias, specified as a formatted dlarray
, an unformatteddlarray
, or a numeric array.
Specify bias
as a vector of length3*NumHiddenUnits
, where NumHiddenUnits
is the size of the "C"
dimension of H0.
If bias
is a formatted dlarray
, the nonsingleton dimension must be labeled with "C"
.
Data Types: single
| double
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: Y = gru(X,H0,weights,recurrentWeights,bias,DataFormat="CTB")
applies the GRU operation and specifies that the data has format "CTB"
(channel, time, batch).
DataFormat
— Description of data dimensions
character vector | string scalar
Description of the data dimensions, specified as a character vector or string scalar.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can specify that this array has the format "CBT"
(channel, batch, time).
You can specify multiple dimensions labeled "S"
or "U"
. You can use the labels "C"
, "B"
, and"T"
once each, at most. The software ignores singleton trailing"U"
dimensions after the second dimension.
If the input data is not a formatted dlarray
object, then you must specify the DataFormat
option.
For more information, see Deep Learning Data Formats.
Data Types: char
| string
ResetGateMode
— Reset gate mode
"after-multiplication"
(default) | "before-multiplication"
| "recurrent-bias-after-multiplication"
Since R2023a
Reset gate mode, specified as one of these values:
"after-multiplication"
— Apply the reset gate after matrix multiplication. This option is cuDNN compatible."before-multiplication"
— Apply the reset gate before matrix multiplication."recurrent-bias-after-multiplication"
— Apply the reset gate after matrix multiplication and use an additional set of bias terms for the recurrent weights.
For more information about the reset gate calculations, see Gated Recurrent Unit Layer on the gruLayer reference page.
StateActivationFunction
— State activation function
"tanh"
(default) | "softsign"
| "relu"
Since R2024a
Activation function to update the hidden state, specified as one of these values:
"tanh"
— Use the hyperbolic tangent function (tanh)."softsign"
— Use the softsign function, softsign(x)=x1+|x|."relu"
(since R2024b) — Use the rectified linear unit (ReLU) function ReLU(x)={x,x>00,x≤0.
The software uses this option as the function σs in the calculations to update the hidden state.
For more information, see Gated Recurrent Unit Layer on the gruLayer reference page.
GateActivationFunction
— Gate activation function
"sigmoid"
(default) | "hard-sigmoid"
Since R2024a
Activation function to apply to the gates, specified as one of these values:
"sigmoid"
— Use the sigmoid function, σ(x)=(1+e−x)−1."hard-sigmoid"
— Use the hard sigmoid function,
The software uses this option as the function σg in the calculations for the layer gates.
For more information, see Gated Recurrent Unit Layer on the gruLayer reference page.
Output Arguments
Y
— GRU output
dlarray
GRU output, returned as a dlarray
. The outputY
has the same underlying data type as the inputX.
If the input data X
is a formatted dlarray
,Y
has the same dimension format as X
, except for any "S"
dimensions. If the input data is not a formatteddlarray
, Y
is an unformatteddlarray
with the same dimension order as the input data.
The size of the "C"
dimension of Y
is the same as the number of hidden units, specified by the size of the "C"
dimension of H0.
hiddenState
— Hidden state vector
dlarray
| numeric array
Hidden state vector for each observation, returned as a dlarray
or a numeric array with the same data type as H0.
If the input H0
is a formatted dlarray
, then the outputhiddenState
is a formatted dlarray
with the format "CB"
.
More About
Gated Recurrent Unit
The GRU operation allows a network to learn dependencies between time steps in time series and sequence data. For more information, see Gated Recurrent Unit Layer on the gruLayer reference page.
References
[1] Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
Extended Capabilities
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
The gru
function supports GPU array input with these usage notes and limitations:
- When at least one of the following input arguments is a
gpuArray
or adlarray
with underlying data of typegpuArray
, this function runs on the GPU:X
H0
weights
recurrentWeights
bias
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2020a
R2024b: Specify the ReLU state activation function
To specify the ReLU state activation function, set the StateActivationFunction property to "relu"
.
R2023a: Specify reset gate mode
Specify the reset gate mode using the ResetGateMode name-value argument.