Embedding (original) (raw)

A simple lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

Note

When max_norm is not None, Embedding’s forward method will modify theweight tensor in-place. Since tensors needed for gradient computations cannot be modified in-place, performing a differentiable operation on Embedding.weight before calling Embedding’s forward method requires cloning Embedding.weight whenmax_norm is not None. For example:

n, d, m = 3, 5, 7 embedding = nn.Embedding(n, d, max_norm=1.0) W = torch.randn((m, d), requires_grad=True) idx = torch.tensor([1, 2]) a = ( embedding.weight.clone() @ W.t() ) # weight must be cloned for this to be differentiable b = embedding(idx) @ W.t() # modifies weight in-place out = a.unsqueeze(0) + b.unsqueeze(1) loss = out.sigmoid().prod() loss.backward()

an Embedding module containing 10 tensors of size 3
embedding = nn.Embedding(10, 3)

a batch of 2 samples of 4 indices each
input = torch.LongTensor([[1, 2, 4, 5], [4, 3, 2, 9]]) embedding(input) tensor([[[-0.0251, -1.6902, 0.7172], [-0.6431, 0.0748, 0.6969], [ 1.4970, 1.3448, -0.9685], [-0.3677, -2.7265, -0.1685]],

    [[ 1.4970,  1.3448, -0.9685],
     [ 0.4362, -0.4004,  0.9400],
     [-0.6431,  0.0748,  0.6969],
     [ 0.9124, -2.3616,  1.1151]]])

example with padding_idx
embedding = nn.Embedding(10, 3, padding_idx=0) input = torch.LongTensor([[0, 2, 0, 5]]) embedding(input) tensor([[[ 0.0000, 0.0000, 0.0000], [ 0.1535, -2.0309, 0.9315], [ 0.0000, 0.0000, 0.0000], [-0.1655, 0.9897, 0.0635]]])

example of changing pad vector
padding_idx = 0 embedding = nn.Embedding(3, 3, padding_idx=padding_idx) embedding.weight Parameter containing: tensor([[ 0.0000, 0.0000, 0.0000], [-0.7895, -0.7089, -0.0364], [ 0.6778, 0.5803, 0.2678]], requires_grad=True) with torch.no_grad(): ... embedding.weight[padding_idx] = torch.ones(3) embedding.weight Parameter containing: tensor([[ 1.0000, 1.0000, 1.0000], [-0.7895, -0.7089, -0.0364], [ 0.6778, 0.5803, 0.2678]], requires_grad=True)

Embedding (original) (raw)

an Embedding module containing 10 tensors of size 3

a batch of 2 samples of 4 indices each

example with padding_idx

example of changing pad vector

example of changing `pad` vector