Streaming Responses in LangChain (original) (raw)

Last Updated : 27 Oct, 2025

Streaming responses in LangChain is a method that allows developers to receive output from a language model incrementally, token by token, instead of waiting for the entire response to complete. This approach creates a more interactive and responsive user experience, similar to live typing in chat applications. By leveraging streaming, applications can provide immediate feedback, reduce perceived latency and enable real-time interaction with LLMs.

Implementation

Step 1: Set Up the Environment

We will create a python virtual environment,

bash `

python -m venv .venv

`

Step 2: Activate Environment

Now we need to activate the environment,

**1. Windows:

bash `

.venv\Scripts\activate

`

**2. Linux / Mac:

bash `

source .venv/bin/activate

`

Step 3: Install packages

We will install the necessary packages,

bash `

pip install --upgrade langchain langchain-openai python-dotenv fastapi uvicorn

`

Step 4: API Key Setup

We need to attach our API key, we will create an .env file and store the key in it,

ini `

OPENAI_API_KEY=your_openai_api_key_here

`

Step 5: Build the Streaming LLM Backend

We will now build our streaming LLM backend, for this we need to create a server.py file in our directory, we have named it as server_stream_better.py,

from fastapi import FastAPI from fastapi.responses import StreamingResponse, HTMLResponse from langchain_openai import ChatOpenAI from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate from dotenv import load_dotenv import os import time

load_dotenv() if not os.getenv("OPENAI_API_KEY"): raise ValueError("OPENAI_API_KEY not found in .env")

app = FastAPI() llm = ChatOpenAI(streaming=True, temperature=0.7, model="gpt-4o-mini")

def event_stream(prompt_text: str): prompt = ChatPromptTemplate.from_messages([ HumanMessagePromptTemplate.from_template(prompt_text) ]) for chunk in llm.stream(prompt.format_messages()): yield f"data: {chunk.content}\n\n" time.sleep(0.01)

@app.get("/stream") def stream_response(prompt: str): return StreamingResponse( event_stream(prompt), media_type="text/event-stream" )

`

Step 6: Add a Live HTML Front-End

We will extend and add a minimal front-end to our code,

@app.get("/", response_class=HTMLResponse) def home(): return """ Interactive LLM Streaming

Interactive LLM Streaming

Send

    <script>
        let evtSource;

        function startStream() {
            const output = document.getElementById("output");
            const prompt = document.getElementById("prompt").value.trim();

            if (!prompt) {
                alert("Please enter a prompt!");
                return;
            }

            output.textContent = "";

            if (evtSource) evtSource.close();

            const url = "/stream?prompt=" + encodeURIComponent(prompt);
            evtSource = new EventSource(url);

            evtSource.onmessage = function(e) {
                output.textContent += e.data;
                output.scrollTop = output.scrollHeight;
            };

            evtSource.onerror = function() {
                output.textContent += "\\n[Connection closed]";
                evtSource.close();
            };
        }
    </script>
</body>
</html>
"""

`

Step 7: Run the Application

We will start the FastAPI server, type the following command in the terminal,

bash `

uvicorn server_stream_better:app --reload

`

After a successful startup, we can see the following on our terminal,

Screenshot-2025-10-18-105206

Terminal

Now open the browser and go to: http://127.0.0.1:8000

Screenshot-2025-10-17-173107

Browser Terminal

Here we can see the interface, we can enter any prompt and then click on send. Tokens will appear live as the model generates them.

Screenshot-2025-10-17-173051

Response

Let's understand how Streaming Works

**1. LLM Streaming

**2. Server-Sent Events (SSE)

**3. Front-End

The source code can be download from here.

Applications of Streaming in LangChain