gelu - Apply Gaussian error linear unit (GELU) activation - MATLAB (original) (raw)
Main Content
Apply Gaussian error linear unit (GELU) activation
Since R2022b
Syntax
Description
The Gaussian error linear unit (GELU) activation operation weights the input by its probability under a Gaussian distribution.
This operation is given by
where erf denotes the error function.
Note
This function applies the GELU operation to dlarray data. If you want to apply the GELU activation within a dlnetwork object, use geluLayer.
[Y](#mw%5F93af9bee-401c-4ecd-b195-5f2a34da0c21) = gelu([X](#mw%5Fe875a859-78e4-4436-8eb5-0b2cecbfe1ca%5Fsep%5Fmw%5F6f734774-5538-44ff-ad05-7e744f97893a))
applies the GELU activation to the input data X
.
[Y](#mw%5F93af9bee-401c-4ecd-b195-5f2a34da0c21) = gelu([X](#mw%5Fe875a859-78e4-4436-8eb5-0b2cecbfe1ca%5Fsep%5Fmw%5F6f734774-5538-44ff-ad05-7e744f97893a),Approximation=[method](#mw%5Fc5c3b454-4036-4d19-84ed-70ab360c6706))
also specifies the approximation method for the GELU operation. For example,Approximation="tanh"
specifies the tanh approximation of the underlying error function.
Examples
Apply GELU Operation
Create a formatted dlarray
object containing a batch of 128 28-by-28 images with three channels. Specify the format "SSCB"
(spatial, spatial, channel, batch).
miniBatchSize = 128; inputSize = [28 28]; numChannels = 3; X = rand(inputSize(1),inputSize(2),numChannels,miniBatchSize); X = dlarray(X,"SSCB");
View the size and format of the input data.
Apply the GELU activation.
View the size and format of the output.
Input Arguments
X
— Input data
dlarray
object
Input data, specified as a formatted or unformatted dlarray object.
method
— Approximation method
"none"
(default) | "tanh"
Approximation method, specified as one of these values:
"none"
— Do not use approximation."tanh"
— Approximate the underlying error function using
Tip
In MATLAB®, computing the tanh approximation is typically less accurate, and, for large input sizes, slower than computing the GELU activation without using an approximation. Use the tanh approximation when you want to reproduce models that use this approximation, such as BERT and GPT-2.
Output Arguments
Y
— GELU activations
dlarray
object
GELU activations, returned as a dlarray
object. The outputY
has the same underlying data type as the inputX
.
If the input data X is a formatted dlarray
object, then Y
has the same dimension format as X
. If the input data is not a formatted dlarray
object, thenY
is an unformatted dlarray
object with the same dimension order as the input data.
Algorithms
Gaussian Error Linear Unit Activation
The Gaussian error linear unit (GELU) activation operation weights the input by its probability under a Gaussian distribution.
This operation is given by
where erf denotes the error function given by
When the Approximation
option is "tanh"
, the software approximates the error function using
Extended Capabilities
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
The gelu
function supports GPU array input with these usage notes and limitations:
- When the input argument
X
is adlarray
with underlying data of typegpuArray
, this function runs on the GPU.
For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2022b