Difference Between "Hidden" and "Output" in PyTorch LSTM (original) (raw)

Last Updated : 21 Apr, 2025

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) that are widely used for sequence prediction tasks. In PyTorch, the nn.LSTM module is a powerful tool for implementing these networks. However, understanding the difference between the "hidden" and "output" states of an LSTM can be confusing for many. This article aims to clarify these concepts, providing detailed explanations and examples to help you understand how LSTMs work in PyTorch.

Table of Content

1. Hidden State (h_n)

The hidden state in an LSTM represents the **short-term memory of the network. It contains information about the sequence that has been processed so far and is updated at each time step. The hidden state is crucial for maintaining information across time steps and layers.

**Shape: The hidden state h_n has the shape (num_layers * num_directions, batch, hidden_size). This shape indicates that the hidden state is maintained for each layer and direction in the LSTM.

2. Output (output)

The output of an LSTM is the sequence of hidden states from the last layer for each time step. Unlike the hidden state, which is only the last hidden state for each sequence, the output includes the hidden state for every time step in the sequence.

**Shape: The output has the shape (seq_len, batch, num_directions * hidden_size), where seq_len is the length of the input sequence.

Below is an example of how to implement an LSTM in PyTorch and access the hidden state and output:

Python `

import torch import torch.nn as nn

Define LSTM parameters

input_size = 10 hidden_size = 20 num_layers = 2 batch_size = 3 seq_len = 5

Initialize LSTM

lstm = nn.LSTM(input_size, hidden_size, num_layers)

Create random input tensor

input_tensor = torch.randn(seq_len, batch_size, input_size)

Initialize hidden and cell states

h0 = torch.zeros(num_layers, batch_size, hidden_size) c0 = torch.zeros(num_layers, batch_size, hidden_size)

Forward pass through LSTM

output, (hn, cn) = lstm(input_tensor, (h0, c0))

print("Output shape:", output.shape) # (seq_len, batch, num_directions * hidden_size) print("Hidden state shape:", hn.shape) # (num_layers * num_directions, batch, hidden_size)

`

**Output:

Output shape: torch.Size([5, 3, 20])
Hidden state shape: torch.Size([2, 3, 20])

Explanation of the Code

Performance Implications

Conclusion

Understanding the difference between the hidden state and output in PyTorch's LSTM is crucial for effectively using this powerful neural network architecture. The hidden state provides a summary of the sequence, while the output contains detailed information for each time step.