Streaming Responses in LangChain (original) (raw)

Last Updated : 27 Oct, 2025

Streaming responses in LangChain is a method that allows developers to receive output from a language model incrementally, token by token, instead of waiting for the entire response to complete. This approach creates a more interactive and responsive user experience, similar to live typing in chat applications. By leveraging streaming, applications can provide immediate feedback, reduce perceived latency and enable real-time interaction with LLMs.

**Token-by-token output: Users see model responses as they are generated.
**Improved interactivity: Makes applications feel faster and more responsive.
**Real-time applications: Ideal for chatbots, assistants, or any live feedback systems.
**Integration with LangChain: Works seamlessly with ChatOpenAI and other LangChain LLMs.
**Underlying technology: Uses Server-Sent Events (SSE) to stream data to the front-end.
**Flexible front-end support: Can be combined with JavaScript or frameworks like React for live updates.
**Extendable: Supports conversation memory, multi-user setups and different LLM models.

Implementation

Step 1: Set Up the Environment

We will create a python virtual environment,

bash `

python -m venv .venv

Step 2: Activate Environment

Now we need to activate the environment,

**1. Windows:

bash `

.venv\Scripts\activate

**2. Linux / Mac:

bash `

source .venv/bin/activate

Step 3: Install packages

We will install the necessary packages,

bash `

pip install --upgrade langchain langchain-openai python-dotenv fastapi uvicorn

Step 4: API Key Setup

We need to attach our API key, we will create an .env file and store the key in it,

ini `

OPENAI_API_KEY=your_openai_api_key_here

Step 5: Build the Streaming LLM Backend

We will now build our streaming LLM backend, for this we need to create a server.py file in our directory, we have named it as server_stream_better.py,

streaming=True enables token-by-token generation.
event_stream() yields each token in the SSE format.
/stream endpoint sends real-time data to the front-end. Python `

from fastapi import FastAPI from fastapi.responses import StreamingResponse, HTMLResponse from langchain_openai import ChatOpenAI from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate from dotenv import load_dotenv import os import time

load_dotenv() if not os.getenv("OPENAI_API_KEY"): raise ValueError("OPENAI_API_KEY not found in .env")

app = FastAPI() llm = ChatOpenAI(streaming=True, temperature=0.7, model="gpt-4o-mini")

def event_stream(prompt_text: str): prompt = ChatPromptTemplate.from_messages([ HumanMessagePromptTemplate.from_template(prompt_text) ]) for chunk in llm.stream(prompt.format_messages()): yield f"data: {chunk.content}\n\n" time.sleep(0.01)

@app.get("/stream") def stream_response(prompt: str): return StreamingResponse( event_stream(prompt), media_type="text/event-stream" )

Step 6: Add a Live HTML Front-End

We will extend and add a minimal front-end to our code,

Displays output live as tokens arrive.
Auto-scroll ensures the latest content is always visible.
Simple input box and button for sending prompts. Python `

@app.get("/", response_class=HTMLResponse) def home(): return """ Interactive LLM Streaming

Interactive LLM Streaming

Send

    <script>
        let evtSource;

        function startStream() {
            const output = document.getElementById("output");
            const prompt = document.getElementById("prompt").value.trim();

            if (!prompt) {
                alert("Please enter a prompt!");
                return;
            }

            output.textContent = "";

            if (evtSource) evtSource.close();

            const url = "/stream?prompt=" + encodeURIComponent(prompt);
            evtSource = new EventSource(url);

            evtSource.onmessage = function(e) {
                output.textContent += e.data;
                output.scrollTop = output.scrollHeight;
            };

            evtSource.onerror = function() {
                output.textContent += "\\n[Connection closed]";
                evtSource.close();
            };
        }
    </script>
</body>
</html>
"""

Step 7: Run the Application

We will start the FastAPI server, type the following command in the terminal,

bash `

uvicorn server_stream_better:app --reload

After a successful startup, we can see the following on our terminal,

Screenshot-2025-10-18-105206

Terminal

Now open the browser and go to: http://127.0.0.1:8000

Screenshot-2025-10-17-173107

Browser Terminal

Here we can see the interface, we can enter any prompt and then click on send. Tokens will appear live as the model generates them.

Screenshot-2025-10-17-173051

Response

Let's understand how Streaming Works

**1. LLM Streaming

streaming=True in ChatOpenAI allows token-by-token output.
Without streaming, you only get the full response after generation finishes.

**2. Server-Sent Events (SSE)

The /stream endpoint sends each token in the format data: \n\n.
Browser receives live updates using EventSource.

**3. Front-End

JavaScript appends each token to a
dynamically.
Auto-scroll ensures the latest text is always visible.
Together, this creates a real-time interactive experience.

The source code can be download from here.

Applications of Streaming in LangChain

**Real-Time Chatbots: Enables token-by-token replies for faster, human-like conversations.
**Coding Assistants: Streams code output live, improving interaction and usability.
**Learning Platforms: Provides gradual explanations or hints in educational tools.
**Content Generation: Allows live story writing or copy generation as text unfolds.
**Data Summarization: Streams ongoing summaries of documents or logs in real time.
**Voice and Speech Systems: Powers responsive voice assistants with live transcription.
**Collaborative Tools: Supports multi-user AI brainstorming or writing platforms.