LSTM Networks (original) (raw)

Last Updated : 5 Jan, 2026

Long Short-Term Memory (LSTM) networks are a special type of recurrent neural network designed to learn from sequence data while overcoming the limitations of traditional RNNs. With their unique memory‑cell structure, LSTMs can remember information over long time intervals, making them highly effective for tasks involving patterns across time.

Working of LSTM

Let's understand the working:

Step 1: Decide What to Forget (Forget Gate)

Step 2: Decide What New Information to Add (Input Gate + Candidate Memory)

Step 3: Update the Cell State (Memory Highway)

Step 4: Generate the Output (Output Gate)

Step-By-Step Implementation

Here build an LSTM-based deep learning model to predict IBM stock prices using historical data.

The used dataset can be downloaded from here.

Step 1: Import Required Libraries

import math from sklearn.metrics import mean_squared_error from keras.layers import LSTM, Dense, Dropout from keras.models import Sequential from sklearn.preprocessing import MinMaxScaler import pandas as pd import numpy as np import matplotlib.pyplot as plt plt.style.use('fivethirtyeight')

def plot_predictions(real_prices, predicted_prices): plt.figure(figsize=(16, 6)) plt.plot(real_prices, color='green', label='Actual IBM Stock Price') plt.plot(predicted_prices, color='orange', label='Predicted IBM Stock Price') plt.title('IBM Stock Price Prediction') plt.xlabel('Time') plt.ylabel('Stock Price') plt.legend() plt.show()

def calculate_rmse(real_prices, predicted_prices): rmse = math.sqrt(mean_squared_error(real_prices, predicted_prices)) print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")

`

Step 2: Load and Visualize IBM Stock Dataset

dataset = pd.read_csv("IBM_2006-01-01_to_2018-01-01.csv", index_col='Date', parse_dates=['Date'])

training_set = dataset.loc[:'2016', 'High'].values.reshape(-1, 1) test_set = dataset.loc['2017':, 'High'].values.reshape(-1, 1)

plt.figure(figsize=(16, 4)) plt.plot(dataset.loc[:'2016', 'High'], label='Training Set (Before 2017)') plt.plot(dataset.loc['2017':, 'High'], label='Test Set (2017 and beyond)') plt.title('IBM Stock Price') plt.xlabel('Date') plt.ylabel('Price') plt.legend() plt.show()

`

**Output:

lstm2

IBM stock data

Step 3: Apply Feature Scaling

scaler = MinMaxScaler(feature_range=(0, 1)) scaled_training = scaler.fit_transform(training_set)

`

Step 4: Prepare Training Sequences

X_train = [] y_train = []

for i in range(60, len(scaled_training)): X_train.append(scaled_training[i - 60:i, 0]) y_train.append(scaled_training[i, 0])

X_train = np.array(X_train) y_train = np.array(y_train)

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)

`

Step 5: Build the LSTM Model

lstm_model = Sequential()

lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1))) lstm_model.add(Dropout(0.2))

lstm_model.add(LSTM(units=50, return_sequences=True)) lstm_model.add(Dropout(0.2))

lstm_model.add(LSTM(units=50, return_sequences=True)) lstm_model.add(Dropout(0.2))

lstm_model.add(LSTM(units=50)) lstm_model.add(Dropout(0.2))

lstm_model.add(Dense(units=1))

lstm_model.compile(optimizer='rmsprop', loss='mean_squared_error')

`

Step 6: Train the Model

lstm_model.fit(X_train, Y_train, epochs=20, batch_size=32, verbose=1)

`

Step 7: Prepare Test Data for Prediction

dataset_total = pd.concat( (dataset['High'][:'2016'], dataset['High']['2017':]), axis=0) inputs = dataset_total[len(dataset_total) - len(test_set) - 60:].values inputs = inputs.reshape(-1, 1) inputs = scaler.transform(inputs)

X_test = [] for i in range(60, len(inputs)): X_test.append(inputs[i - 60:i, 0])

X_test = np.array(X_test) X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

`

Step 8: Predict Stock Prices

predicted_prices_scaled = lstm_model.predict(X_test) predicted_prices = scaler.inverse_transform(predicted_prices_scaled)

`

Step 9: Plot Actual vs Predicted Prices

plot_predictions(test_set, predicted_prices)

`

**Output:

lstm3

Stock Price Prediction

As we can see our model is working fine and is doing accurate predictions.

You can download full code from here

LSTM Variations

1. Vanilla LSTM (Standard LSTM)

2. Stacked LSTM (Deep LSTM)

3. Bidirectional LSTM (BiLSTM)

4. Peephole LSTM

5. CNN–LSTM Hybrid

6. Encoder–Decoder LSTM

7. Attention-LSTM Models

LSTM vs. GRU

Let's compare LSTM and GRU:

Feature LSTM GRU
Gates 3 gates (Input, Forget, Output) 2 gates (Update, Reset)
Memory Structure Separate Cell State + Hidden State Single combined Hidden State
Complexity More complex, more parameters Simpler, fewer parameters
Training Speed Slower Faster
Long-Term Dependency Handling Stronger for very long sequences Good but slightly less precise for very long dependencies
Model Size & Efficiency Larger and more resource-heavy Smaller, efficient, suitable for real-time and mobile use

Applications