WordEmbeddingLayer - Word embedding layer for deep learning neural network - MATLAB (original) (raw)
Word embedding layer for deep learning neural network
Description
A word embedding layer maps word indices to vectors.
Use a word embedding layer in a deep learning long short-term memory (LSTM) network. An LSTM network is a type of recurrent neural network (RNN) that can learn long-term dependencies between time steps of sequence data. A word embedding layer maps a sequence of word indices to embedding vectors and learns the word embedding during training.
This layer requires Deep Learning Toolbox™.
Creation
Syntax
Description
`layer` = wordEmbeddingLayer(`dimension`,`numWords`)
creates a word embedding layer and specifies the embedding dimension and vocabulary size.
`layer` = wordEmbeddingLayer(`dimension`,`numWords`,[PropertyName=Value](#mw%5F0f52111a-deec-430d-a18a-f525c8d10c6a))
sets optional properties using one or more name-value pairs. Enclose each property name in single quotes.
Properties
Word Embedding
Dimension of the word embedding, specified as a positive integer.
Example: 300
Number of words in the model, specified as a positive integer. If the number of unique words in the training data is greater than NumWords
, then the layer maps the out-of-vocabulary words to the same vector.
Since R2023b
Out-of-vocabulary word handling mode, specified as one of these values:
"map-to-last"
— Map out-of-vocabulary words to the last embedding vector inWeights
."error"
— Throw an error when layer receives out-of-vocabulary words. Use this option for models that already have an out-of-vocabulary token in its vocabulary, such as BERT.
Parameters and Initialization
Function to initialize the weights, specified as one of the following:
'narrow-normal'
– Initialize the weights by independently sampling from a normal distribution with zero mean and standard deviation 0.01.'glorot'
– Initialize the weights with the Glorot initializer [1] (also known as Xavier initializer). The Glorot initializer independently samples from a uniform distribution with zero mean and variance2/(numIn + numOut)
, wherenumIn = NumWords + 1
andnumOut = Dimension
.'he'
– Initialize the weights with the He initializer[2]. The He initializer samples from a normal distribution with zero mean and variance2/numIn
, wherenumIn = NumWords + 1
.'orthogonal'
– Initialize the input weights with_Q_, the orthogonal matrix given by the QR decomposition of_Z_ = Q R for a random matrix Z sampled from a unit normal distribution. [3]'zeros'
– Initialize the weights with zeros.'ones'
– Initialize the weights with ones.- Function handle – Initialize the weights with a custom function. If you specify a function handle, then the function must be of the form
weights = func(sz)
, wheresz
is the size of the weights.
The layer only initializes the weights when the Weights
property is empty.
Data Types: char
| string
| function_handle
Layer weights, specified as a Dimension
-by-NumWords
array or aDimension
-by-(NumWords+1
) array.
If Weights
is a Dimension
-by-NumWords
array, then the software automatically appends an extra column for out-of-vocabulary input when training a network using thetrainNetwork
function or when initializing adlnetwork
object.
For input integers i
less than or equal to NumWords
, the layer outputs the vectorWeights(:,i)
. Otherwise, the layer maps outputs the vectorWeights(:,NumWords+1)
.
Learn Rate and Regularization
Learning rate factor for the weights, specified as a nonnegative scalar.
The software multiplies this factor by the global learning rate to determine the learning rate for the weights in this layer. For example, if WeightLearnRateFactor
is 2
, then the learning rate for the weights in this layer is twice the current global learning rate. The software determines the global learning rate based on the settings you specify using the trainingOptions (Deep Learning Toolbox) function.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Layer
Data Types: char
| string
This property is read-only.
Number of inputs to the layer, stored as 1
. This layer accepts a single input only.
Data Types: double
This property is read-only.
Input names, stored as {'in'}
. This layer accepts a single input only.
Data Types: cell
This property is read-only.
Number of outputs from the layer, stored as 1
. This layer has a single output only.
Data Types: double
This property is read-only.
Output names, stored as {'out'}
. This layer has a single output only.
Data Types: cell
Examples
Create a word embedding layer with embedding dimension 300 and 5000 words.
layer = wordEmbeddingLayer(300,5000)
layer = WordEmbeddingLayer with properties:
Name: ''
OOVMode: 'map-to-last'
Hyperparameters Dimension: 300 NumWords: 5000
Learnable Parameters Weights: []
Show all properties
Include a word embedding layer in an LSTM network.
inputSize = 1; embeddingDimension = 300; numWords = 5000; numHiddenUnits = 200; numClasses = 10;
layers = [ sequenceInputLayer(inputSize) wordEmbeddingLayer(embeddingDimension,numWords) lstmLayer(numHiddenUnits,'OutputMode','last') fullyConnectedLayer(numClasses) softmaxLayer]
layers = 5×1 Layer array with layers:
1 '' Sequence Input Sequence input with 1 dimensions
2 '' Word Embedding Layer Word embedding layer with 300 dimensions and 5000 unique words
3 '' LSTM LSTM with 200 hidden units
4 '' Fully Connected 10 fully connected layer
5 '' Softmax softmax
To initialize a word embedding layer in a deep learning network with the weights from a pretrained word embedding, use the word2vec
function to extract the layer weights and set the 'Weights'
name-value pair of the wordEmbeddingLayer
function. The word embedding layer expects columns of word vectors, so you must transpose the output of the word2vec
function.
emb = fastTextWordEmbedding;
words = emb.Vocabulary; dimension = emb.Dimension; numWords = numel(words);
layer = wordEmbeddingLayer(dimension,numWords,... 'Weights',word2vec(emb,words)')
layer = WordEmbeddingLayer with properties:
Name: ''
Hyperparameters Dimension: 300 NumWords: 999994
Learnable Parameters Weights: [300×999994 single]
Show all properties
To create the corresponding word encoding from the word embedding, input the word embedding vocabulary to the wordEncoding
function as a list of words.
enc = wordEncoding(words)
enc = wordEncoding with properties:
NumWords: 999994
Vocabulary: [1×999994 string]
References
[1] Glorot, Xavier, and Yoshua Bengio. "Understanding the Difficulty of Training Deep Feedforward Neural Networks." In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 249–356. Sardinia, Italy: AISTATS, 2010. https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
[2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." In 2015 IEEE International Conference on Computer Vision (ICCV), 1026–34. Santiago, Chile: IEEE, 2015. https://doi.org/10.1109/ICCV.2015.123
[3] Saxe, Andrew M., James L. McClelland, and Surya Ganguli. "Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks.” Preprint, submitted February 19, 2014. https://arxiv.org/abs/1312.6120.
Extended Capabilities
Usage notes and limitations:
- The property
OOVMode
is always set to"map-to-last"
when generating code that depends on third-party deep learning libraries. - The property
OOVMode
reverts to"map-to-last"
if the runtime check is disabled in configuration setting. To enable the runtime check, setRuntimeChecks
totrue
for generating standalone C/C++ code, or setIntegrityChecks
totrue
for generating MEX code. For more information, see coder.config (MATLAB Coder) and coder.MexCodeConfig (MATLAB Coder).
Usage notes and limitations:
- The property
OOVMode
is always set to"map-to-last"
.
Version History
Introduced in R2018b
Specify the out-of-vocabulary word handling mode using the OOVMode property. You can specify that the layer maps out-of-vocabulary words to the same embedding vector or throws an error. This property enables support for models that do not support out-of-vocabulary words such as BERT.