EmbeddingBag (original) (raw)
Compute sums or means of ‘bags’ of embeddings, without instantiating the intermediate embeddings.
For bags of constant length, no per_sample_weights, no indices equal to padding_idx, and with 2D inputs, this class
However, EmbeddingBag is much more time and memory efficient than using a chain of these operations.
EmbeddingBag also supports per-sample weights as an argument to the forward pass. This scales the output of the Embedding before performing a weighted reduction as specified by mode. If per_sample_weights is passed, the only supported mode is "sum", which computes a weighted sum according toper_sample_weights.
Parameters
- num_embeddings (int) – size of the dictionary of embeddings
- embedding_dim (int) – the size of each embedding vector
- max_norm (float, optional) – If given, each embedding vector with norm larger than
max_normis renormalized to have normmax_norm. - norm_type (float, optional) – The p of the p-norm to compute for the
max_normoption. Default2. - scale_grad_by_freq (bool, optional) – if given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default
False. Note: this option is not supported whenmode="max". - mode (str, optional) –
"sum","mean"or"max". Specifies the way to reduce the bag."sum"computes the weighted sum, takingper_sample_weightsinto consideration."mean"computes the average of the values in the bag,"max"computes the max value over each bag. Default:"mean" - sparse (bool, optional) – if
True, gradient w.r.t.weightmatrix will be a sparse tensor. See Notes for more details regarding sparse gradients. Note: this option is not supported whenmode="max". - include_last_offset (bool, optional) – if
True,offsetshas one additional element, where the last element is equivalent to the size of indices. This matches the CSR format. - padding_idx (int, optional) – If specified, the entries at
padding_idxdo not contribute to the gradient; therefore, the embedding vector atpadding_idxis not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed EmbeddingBag, the embedding vector atpadding_idxwill default to all zeros, but can be updated to another value to be used as the padding vector. Note that the embedding vector atpadding_idxis excluded from the reduction.
Variables
weight (Tensor) – the learnable weights of the module of shape (num_embeddings, embedding_dim)initialized from N(0,1)\mathcal{N}(0, 1).
Examples:
an EmbeddingBag module containing 10 tensors of size 3
embedding_sum = nn.EmbeddingBag(10, 3, mode='sum')
a batch of 2 samples of 4 indices each
input = torch.tensor([1, 2, 4, 5, 4, 3, 2, 9], dtype=torch.long) offsets = torch.tensor([0, 4], dtype=torch.long) embedding_sum(input, offsets) tensor([[-0.8861, -5.4350, -0.0523], [ 1.1306, -2.5798, -1.0044]])
Example with padding_idx
embedding_sum = nn.EmbeddingBag(10, 3, mode='sum', padding_idx=2) input = torch.tensor([2, 2, 2, 2, 4, 3, 2, 9], dtype=torch.long) offsets = torch.tensor([0, 4], dtype=torch.long) embedding_sum(input, offsets) tensor([[ 0.0000, 0.0000, 0.0000], [-0.7082, 3.2145, -2.6251]])
An EmbeddingBag can be loaded from an Embedding like so
embedding = nn.Embedding(10, 3, padding_idx=2) embedding_sum = nn.EmbeddingBag.from_pretrained( embedding.weight, padding_idx=embedding.padding_idx, mode='sum')
forward(input, offsets=None, per_sample_weights=None)[source]#
Forward pass of EmbeddingBag.
Parameters
- input (Tensor) – Tensor containing bags of indices into the embedding matrix.
- offsets (Tensor, optional) – Only used when
inputis 1D.offsetsdetermines the starting index position of each bag (sequence) ininput. - per_sample_weights (Tensor, optional) – a tensor of float / double weights, or None to indicate all weights should be taken to be
1. If specified,per_sample_weightsmust have exactly the same shape as input and is treated as having the sameoffsets, if those are notNone. Only supported formode='sum'.
Returns
Tensor output shape of (B, embedding_dim).
Return type
Note
A few notes about input and offsets:
inputandoffsetshave to be of the same type, either int or long- If
inputis 2D of shape (B, N), it will be treated asBbags (sequences) each of fixed lengthN, and this will returnBvalues aggregated in a way depending on themode.offsetsis ignored and required to beNonein this case. - If
inputis 1D of shape (N), it will be treated as a concatenation of multiple bags (sequences).offsetsis required to be a 1D tensor containing the starting index positions of each bag ininput. Therefore, foroffsetsof shape (B),inputwill be viewed as havingBbags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.
classmethod from_pretrained(embeddings, freeze=True, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, mode='mean', sparse=False, include_last_offset=False, padding_idx=None)[source]#
Create EmbeddingBag instance from given 2-dimensional FloatTensor.
Parameters
- embeddings (Tensor) – FloatTensor containing weights for the EmbeddingBag. First dimension is being passed to EmbeddingBag as ‘num_embeddings’, second as ‘embedding_dim’.
- freeze (bool, optional) – If
True, the tensor does not get updated in the learning process. Equivalent toembeddingbag.weight.requires_grad = False. Default:True - max_norm (float, optional) – See module initialization documentation. Default:
None - norm_type (float, optional) – See module initialization documentation. Default
2. - scale_grad_by_freq (bool, optional) – See module initialization documentation. Default
False. - mode (str, optional) – See module initialization documentation. Default:
"mean" - sparse (bool, optional) – See module initialization documentation. Default:
False. - include_last_offset (bool, optional) – See module initialization documentation. Default:
False. - padding_idx (int, optional) – See module initialization documentation. Default:
None.
Return type
Examples:
FloatTensor containing pretrained weights
weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]]) embeddingbag = nn.EmbeddingBag.from_pretrained(weight)
Get embeddings for index 1
input = torch.LongTensor([[1, 0]]) embeddingbag(input) tensor([[ 2.5000, 3.7000, 4.6500]])