OpenAI | liteLLM (original) (raw)

LiteLLM supports OpenAI Chat + Embedding calls.

tip

We recommend using litellm.responses() / Responses API for the latest OpenAI models (GPT-5, gpt-5-codex, o3-mini, etc.)

Required API Keys

import os 
os.environ["OPENAI_API_KEY"] = "your-api-key"

Usage

import os 
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"

# openai call
response = completion(
    model = "gpt-4o", 
    messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Metadata passthrough (preview)

When litellm.enable_preview_features = True, LiteLLM forwards only the values inside metadata to OpenAI.

completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "hi"}],
    metadata= {"custom_meta_key": "value"},
)

Usage - LiteLLM Proxy Server

Here's how to call OpenAI models with the LiteLLM Proxy Server

1. Save key in your environment

2. Start the proxy

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: openai/gpt-3.5-turbo                          # The `openai/` prefix will call openai.chat.completions.create
      api_key: os.environ/OPENAI_API_KEY
  - model_name: gpt-3.5-turbo-instruct
    litellm_params:
      model: text-completion-openai/gpt-3.5-turbo-instruct # The `text-completion-openai/` prefix will call openai.completions.create
      api_key: os.environ/OPENAI_API_KEY

3. Test it

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-3.5-turbo",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ]
    }
'

Optional Keys - OpenAI Organization, OpenAI API Base

import os 
os.environ["OPENAI_ORGANIZATION"] = "your-org-id"       # OPTIONAL
os.environ["OPENAI_BASE_URL"] = "https://your_host/v1"     # OPTIONAL

OpenAI Chat Completion Models

Model Name Function Call
gpt-5 response = completion(model="gpt-5", messages=messages)
gpt-5-mini response = completion(model="gpt-5-mini", messages=messages)
gpt-5-nano response = completion(model="gpt-5-nano", messages=messages)
gpt-5-chat response = completion(model="gpt-5-chat", messages=messages)
gpt-5-chat-latest response = completion(model="gpt-5-chat-latest", messages=messages)
gpt-5-2025-08-07 response = completion(model="gpt-5-2025-08-07", messages=messages)
gpt-5-mini-2025-08-07 response = completion(model="gpt-5-mini-2025-08-07", messages=messages)
gpt-5-nano-2025-08-07 response = completion(model="gpt-5-nano-2025-08-07", messages=messages)
gpt-5-pro response = completion(model="gpt-5-pro", messages=messages)
gpt-5.2 response = completion(model="gpt-5.2", messages=messages)
gpt-5.2-2025-12-11 response = completion(model="gpt-5.2-2025-12-11", messages=messages)
gpt-5.2-chat-latest response = completion(model="gpt-5.2-chat-latest", messages=messages)
gpt-5.3-chat-latest response = completion(model="gpt-5.3-chat-latest", messages=messages)
gpt-5.4 response = completion(model="gpt-5.4", messages=messages)
gpt-5.4-2026-03-05 response = completion(model="gpt-5.4-2026-03-05", messages=messages)
gpt-5.5 response = completion(model="gpt-5.5", messages=messages)
gpt-5.5-2026-04-23 response = completion(model="gpt-5.5-2026-04-23", messages=messages)
gpt-5.2-pro response = completion(model="gpt-5.2-pro", messages=messages)
gpt-5.2-pro-2025-12-11 response = completion(model="gpt-5.2-pro-2025-12-11", messages=messages)
gpt-5.4-pro response = completion(model="gpt-5.4-pro", messages=messages)
gpt-5.4-pro-2026-03-05 response = completion(model="gpt-5.4-pro-2026-03-05", messages=messages)
gpt-5.5-pro response = completion(model="gpt-5.5-pro", messages=messages)
gpt-5.5-pro-2026-04-23 response = completion(model="gpt-5.5-pro-2026-04-23", messages=messages)
gpt-5.1 response = completion(model="gpt-5.1", messages=messages)
gpt-5.1-codex response = completion(model="gpt-5.1-codex", messages=messages)
gpt-5.1-codex-mini response = completion(model="gpt-5.1-codex-mini", messages=messages)
gpt-5.1-codex-max response = completion(model="gpt-5.1-codex-max", messages=messages)
gpt-4.1 response = completion(model="gpt-4.1", messages=messages)
gpt-4.1-mini response = completion(model="gpt-4.1-mini", messages=messages)
gpt-4.1-nano response = completion(model="gpt-4.1-nano", messages=messages)
o4-mini response = completion(model="o4-mini", messages=messages)
o3-mini response = completion(model="o3-mini", messages=messages)
o3 response = completion(model="o3", messages=messages)
o1-mini response = completion(model="o1-mini", messages=messages)
o1-preview response = completion(model="o1-preview", messages=messages)
gpt-4o-mini response = completion(model="gpt-4o-mini", messages=messages)
gpt-4o-mini-2024-07-18 response = completion(model="gpt-4o-mini-2024-07-18", messages=messages)
gpt-4o response = completion(model="gpt-4o", messages=messages)
gpt-4o-2024-08-06 response = completion(model="gpt-4o-2024-08-06", messages=messages)
gpt-4o-2024-05-13 response = completion(model="gpt-4o-2024-05-13", messages=messages)
gpt-4-turbo response = completion(model="gpt-4-turbo", messages=messages)
gpt-4-turbo-preview response = completion(model="gpt-4-0125-preview", messages=messages)
gpt-4-0125-preview response = completion(model="gpt-4-0125-preview", messages=messages)
gpt-4-1106-preview response = completion(model="gpt-4-1106-preview", messages=messages)
gpt-3.5-turbo-1106 response = completion(model="gpt-3.5-turbo-1106", messages=messages)
gpt-3.5-turbo response = completion(model="gpt-3.5-turbo", messages=messages)
gpt-3.5-turbo-0301 response = completion(model="gpt-3.5-turbo-0301", messages=messages)
gpt-3.5-turbo-0613 response = completion(model="gpt-3.5-turbo-0613", messages=messages)
gpt-3.5-turbo-16k response = completion(model="gpt-3.5-turbo-16k", messages=messages)
gpt-3.5-turbo-16k-0613 response = completion(model="gpt-3.5-turbo-16k-0613", messages=messages)
gpt-4 response = completion(model="gpt-4", messages=messages)
gpt-4-0314 response = completion(model="gpt-4-0314", messages=messages)
gpt-4-0613 response = completion(model="gpt-4-0613", messages=messages)
gpt-4-32k response = completion(model="gpt-4-32k", messages=messages)
gpt-4-32k-0314 response = completion(model="gpt-4-32k-0314", messages=messages)
gpt-4-32k-0613 response = completion(model="gpt-4-32k-0613", messages=messages)

These also support the OPENAI_BASE_URL environment variable, which can be used to specify a custom API endpoint.

OpenAI Web Search Models

OpenAI has two ways to use web search, depending on the endpoint:

Approach Endpoint Models How to enable
Search Models /chat/completions gpt-5-search-api, gpt-4o-search-preview, gpt-4o-mini-search-preview Pass web_search_options parameter
Web Search Tool /responses gpt-5, gpt-4.1, gpt-4o, and other regular models Pass web_search_preview tool
from litellm import completion

response = completion(
    model="openai/gpt-5-search-api",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    web_search_options={
        "search_context_size": "medium"  # Options: "low", "medium", "high"
    }
)

For full details, see the Web Search guide.

OpenAI Vision Models

Model Name Function Call
gpt-4o response = completion(model="gpt-4o", messages=messages)
gpt-4-turbo response = completion(model="gpt-4-turbo", messages=messages)
gpt-4-vision-preview response = completion(model="gpt-4-vision-preview", messages=messages)

Usage

import os 
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"

# openai call
response = completion(
    model = "gpt-4-vision-preview", 
    messages=[
        {
            "role": "user",
            "content": [
                            {
                                "type": "text",
                                "text": "What’s in this image?"
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                "url": "https://awsmp-logos.s3.amazonaws.com/seller-xw5kijmvmzasy/c233c9ade2ccb5491072ae232c814942.png"
                                }
                            }
                        ]
        }
    ],
)

PDF File Parsing

OpenAI has a new file message type that allows you to pass in a PDF file and have it parsed into a structured output. Read more

import base64
from litellm import completion

with open("draconomicon.pdf", "rb") as f:
    data = f.read()

base64_string = base64.b64encode(data).decode("utf-8")

completion = completion(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "file",
                    "file": {
                        "filename": "draconomicon.pdf",
                        "file_data": f"data:application/pdf;base64,{base64_string}",
                    }
                },
                {
                    "type": "text",
                    "text": "What is the first dragon in the book?",
                }
            ],
        },
    ],
)

print(completion.choices[0].message.content)

OpenAI Fine Tuned Models

Model Name Function Call
fine tuned gpt-4-0613 response = completion(model="ft:gpt-4-0613", messages=messages)
fine tuned gpt-4o-2024-05-13 response = completion(model="ft:gpt-4o-2024-05-13", messages=messages)
fine tuned gpt-3.5-turbo-0125 response = completion(model="ft:gpt-3.5-turbo-0125", messages=messages)
fine tuned gpt-3.5-turbo-1106 response = completion(model="ft:gpt-3.5-turbo-1106", messages=messages)
fine tuned gpt-3.5-turbo-0613 response = completion(model="ft:gpt-3.5-turbo-0613", messages=messages)

[BETA] Route all .completions requests to Responses API (better quality)

When enabled, LiteLLM sends OpenAI traffic from litellm.completion() and the proxy /chat/completions endpoint through the Responses API instead of Chat Completions. That path generally matches OpenAI’s latest model behavior and quality (for example, reasoning output on GPT‑5 class models).

You can opt in globally or per request:

Option A — per-request prefix: Use the openai/responses/ model prefix.

Option B — global flag (recommended): Set route_all_chat_openai_to_responses = True to automatically route all OpenAI /chat/completions requests through the Responses API, no model prefix needed.

import litellm

litellm.route_all_chat_openai_to_responses = True

response = litellm.completion(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    reasoning_effort="low",
)

note

route_all_chat_openai_to_responses only applies to the openai provider. Azure OpenAI is unaffected. You can also set it via env var: LITELLM_ROUTE_ALL_CHAT_OPENAI_TO_RESPONSES=true.

Option A — per-request prefix: You can also prefix individual model names with openai/responses/ to route just that call through the Responses API.

response = litellm.completion(
    model="openai/responses/gpt-5-mini", # tells litellm to call the model via the Responses API
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    reasoning_effort="low",
)

Expected Response:

{
  "id": "chatcmpl-6382a222-43c9-40c4-856b-22e105d88075",
  "created": 1760146746,
  "model": "gpt-5-mini",
  "object": "chat.completion",
  "system_fingerprint": null,
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Paris",
        "role": "assistant",
        "tool_calls": null,
        "function_call": null,
        "reasoning_content": "**Identifying the capital**\n\nThe user wants me to think of the capital of France and write it down. That's pretty straightforward: it's Paris. There aren't any safety issues to consider here. I think it would be best to keep it concise, so maybe just \"Paris\" would suffice. I feel confident that I should just stick to that without adding anything else. So, let's write it down!",
        "provider_specific_fields": null
      }
    }
  ],
  "usage": {
    "completion_tokens": 7,
    "prompt_tokens": 18,
    "total_tokens": 25,
    "completion_tokens_details": null,
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": 0,
      "text_tokens": null,
      "image_tokens": null
    }
  }
}

Advanced: Using reasoning_effort with summary field

By default, reasoning_effort accepts a string value ("none", "minimal", "low", "medium", "high", "xhigh""xhigh" is only supported on gpt-5.1-codex-max and gpt-5.2 models) and only sets the effort level without including a reasoning summary.

To opt-in to the summary feature, you can pass reasoning_effort as a dictionary. Note: The summary field requires your OpenAI organization to have verification status. Using summary without verification will result in a 400 error from OpenAI.

# Option 1: String format (default - no summary)
response = litellm.completion(
    model="openai/responses/gpt-5-mini",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    reasoning_effort="high"  # Only sets effort level
)

# Option 2: Dict format (with optional summary - requires org verification)
response = litellm.completion(
    model="openai/responses/gpt-5-mini",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    reasoning_effort={"effort": "high", "summary": "auto"}  # "auto", "detailed", or "concise" (not all supported by all models)
)

Summary field options:

Note: GPT-5 series models support "auto" and "detailed", but do not support "concise". O-series models (o3-pro, o4-mini, o3) support all three options. Some models like o3-mini and o1 do not support reasoning summaries at all.

Supported reasoning_effort values by model:

Model Default (when not set) Supported Values
gpt-5.1 none none, low, medium, high
gpt-5 medium minimal, low, medium, high
gpt-5-mini medium minimal, low, medium, high
gpt-5-nano none none, low, medium, high
gpt-5-codex adaptive low, medium, high (no minimal)
gpt-5.1-codex adaptive low, medium, high (no minimal)
gpt-5.1-codex-mini adaptive low, medium, high (no minimal)
gpt-5.1-codex-max adaptive low, medium, high, xhigh (no minimal)
gpt-5.2 medium none, low, medium, high, xhigh
gpt-5.2-pro high low, medium, high, xhigh
gpt-5.5 medium none, minimal, low, medium, high, xhigh
gpt-5.5-pro high minimal, low, medium, high, xhigh
gpt-5-pro high high only

Note:

See OpenAI Reasoning documentation for more details on organization verification requirements.

Multi-turn Conversations with reasoning_items

For multi-turn conversations you need reasoning_items: structured blocks that include the encrypted_content token OpenAI uses to restore reasoning state on the next request. Pass include=["reasoning.encrypted_content"] on every call where you want that token returned.

Non-streaming: round-trip reasoning_items

import litellm

messages = [{"role": "user", "content": "Solve this step by step: 2 + 2"}]

# Turn 1 — get reasoning_items (encrypted_content);
response = litellm.completion(
    model="openai/responses/gpt-5-mini",
    messages=messages,
    reasoning_effort="low",
    include=["reasoning.encrypted_content"],
)

assistant_msg = response.choices[0].message

# Turn 2 — pass reasoning_items back; LiteLLM converts to the correct Responses API format
messages.append({
    "role": "assistant",
    "content": assistant_msg.content,
    "reasoning_items": assistant_msg.reasoning_items,
})
messages.append({"role": "user", "content": "Now summarize your reasoning."})

response2 = litellm.completion(
    model="openai/responses/gpt-5-mini",
    messages=messages,
    reasoning_effort="low",
    include=["reasoning.encrypted_content"],
)

Verbosity Control for GPT-5 Models

The verbosity parameter controls the length and detail of responses from GPT-5 family models. It accepts three values: "low", "medium", or "high".

Supported models: gpt-5, gpt-5.1, gpt-5-mini, gpt-5-nano, gpt-5-pro

Note: GPT-5-Codex models (gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-mini, gpt-5.1-codex-max) do not support the verbosity parameter.

Use cases:

import litellm

# Low verbosity - concise responses
response = litellm.completion(
    model="gpt-5.1",
    messages=[{"role": "user", "content": "Write a function to reverse a string"}],
    verbosity="low"
)

# High verbosity - detailed responses
response = litellm.completion(
    model="gpt-5.1",
    messages=[{"role": "user", "content": "Explain how neural networks work"}],
    verbosity="high"
)

OpenAI Chat Completion to Responses API Bridge

LiteLLM offers a chat completion to Responses API bridge. This lets you use the completion interface while calling the Responses API under the hood.

This is useful when you want to use Responses API specific features (like built-in tools, web search preview, or code interpreter).

gpt-5.4+ + reasoning_effort + function tools

LiteLLM drops reasoning_effort from gpt-5.4 and newer (gpt-5.4, gpt-5.5, future 5.x releases) requests to litellm.completion() that include tools, since that combination is only supported in the Responses API.

If you need reasoning and tools together, use the responses bridge instead (LiteLLM also auto-routes these requests to /v1/responses when both tools and reasoning_effort are set):

response = litellm.completion(
    model="openai/responses/gpt-5.5",  # routes to /v1/responses
    messages=[{"role": "user", "content": "What's the weather?"}],
    tools=[...],
    reasoning_effort="low",
)

When to use the openai/responses/ prefix

Each model has a mode property defined in model_prices_and_context_window.json that determines which API endpoint it uses by default:

Models with mode: responses (automatic Responses API):

Models with mode: chat (require openai/responses/ prefix for built-in tools):

To use built-in tools like web_search_preview with mode: chat models, add the openai/responses/ prefix:

# This will FAIL - gpt-4o has mode: chat, uses Chat Completions API
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is the weather in Paris today?"}],
    tools=[{"type": "web_search_preview"}],  # Not supported in Chat Completions
    # ... other kwargs
)

# This will WORK - prefix forces Responses API
response = litellm.completion(
    model="openai/responses/gpt-4o",
    messages=[{"role": "user", "content": "What is the weather in Paris today?"}],
    tools=[{"type": "web_search_preview"}],  # Supported in Responses API
    # ... other kwargs
)

Examples

Using a model with mode: responses (automatic):

import litellm
import os

os.environ["OPENAI_API_KEY"] = "sk-1234"

response = litellm.completion(
    model="o3-deep-research-2025-06-26",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    tools=[
        {"type": "web_search_preview"},
        {"type": "code_interpreter", "container": {"type": "auto"}},
    ],
)
print(response)

Using a model with mode: chat (requires prefix):

import litellm
import os

os.environ["OPENAI_API_KEY"] = "sk-1234"

# Use the openai/responses/ prefix to enable built-in tools
response = litellm.completion(
    model="openai/responses/gpt-4o",
    messages=[{"role": "user", "content": "What is the weather in Paris today?"}],
    tools=[
        {"type": "web_search_preview"},
    ],
)
print(response)

OpenAI Audio Transcription

LiteLLM supports OpenAI Audio Transcription endpoint.

Supported models:

Model Name Function Call
whisper-1 response = completion(model="whisper-1", file=audio_file)
gpt-4o-transcribe response = completion(model="gpt-4o-transcribe", file=audio_file)
gpt-4o-mini-transcribe response = completion(model="gpt-4o-mini-transcribe", file=audio_file)
from litellm import transcription
import os 

# set api keys 
os.environ["OPENAI_API_KEY"] = ""
audio_file = open("/path/to/audio.mp3", "rb")

response = transcription(model="gpt-4o-transcribe", file=audio_file)

print(f"response: {response}")

Advanced

Set litellm.return_response_headers = True to get raw response headers from OpenAI

You can expect to always get the _response_headers field from litellm.completion(), litellm.embedding() functions

litellm.return_response_headers = True

# /chat/completion
response = completion(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": "hi",
        }
    ],
)
print(f"response: {response}")
print("_response_headers=", response._response_headers)

Expected Response Headers from OpenAI

{
  "date": "Sat, 20 Jul 2024 22:05:23 GMT",
  "content-type": "application/json",
  "transfer-encoding": "chunked",
  "connection": "keep-alive",
  "access-control-allow-origin": "*",
  "openai-model": "text-embedding-ada-002",
  "openai-organization": "*****",
  "openai-processing-ms": "20",
  "openai-version": "2020-10-01",
  "strict-transport-security": "max-age=15552000; includeSubDomains; preload",
  "x-ratelimit-limit-requests": "5000",
  "x-ratelimit-limit-tokens": "5000000",
  "x-ratelimit-remaining-requests": "4999",
  "x-ratelimit-remaining-tokens": "4999999",
  "x-ratelimit-reset-requests": "12ms",
  "x-ratelimit-reset-tokens": "0s",
  "x-request-id": "req_cc37487bfd336358231a17034bcfb4d9",
  "cf-cache-status": "DYNAMIC",
  "set-cookie": "__cf_bm=E_FJY8fdAIMBzBE2RZI2.OkMIO3lf8Hz.ydBQJ9m3q8-1721513123-1.0.1.1-6OK0zXvtd5s9Jgqfz66cU9gzQYpcuh_RLaUZ9dOgxR9Qeq4oJlu.04C09hOTCFn7Hg.k.2tiKLOX24szUE2shw; path=/; expires=Sat, 20-Jul-24 22:35:23 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None, *cfuvid=SDndIImxiO3U0aBcVtoy1TBQqYeQtVDo1L6*Nlpp7EU-1721513123215-0.0.1.1-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None",
  "x-content-type-options": "nosniff",
  "server": "cloudflare",
  "cf-ray": "8a66409b4f8acee9-SJC",
  "content-encoding": "br",
  "alt-svc": "h3=\":443\"; ma=86400"
}

Parallel Function calling

See a detailed walthrough of parallel function calling with litellm here

import litellm
import json
# set openai api key
import os
os.environ['OPENAI_API_KEY'] = "" # litellm reads OPENAI_API_KEY from .env and sends the request
# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]

response = litellm.completion(
    model="gpt-3.5-turbo-1106",
    messages=messages,
    tools=tools,
    tool_choice="auto",  # auto is default, but we'll be explicit
)
print("\nLLM Response1:\n", response)
response_message = response.choices[0].message
tool_calls = response.choices[0].message.tool_calls
import os 
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"

response = completion(
    model = "gpt-3.5-turbo", 
    messages=[{ "content": "Hello, how are you?","role": "user"}],
    extra_headers={"AI-Resource Group": "ishaan-resource"}
)

Setting Organization-ID for completion calls

This can be set in one of the following ways:

import os 
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["OPENAI_ORGANIZATION"] = "your-org-id" # OPTIONAL

response = completion(
    model = "gpt-3.5-turbo", 
    messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Set ssl_verify=False

This is done by setting your own httpx.Client

import litellm, httpx

# for completion
litellm.client_session = httpx.Client(verify=False)
response = litellm.completion(
    model="gpt-3.5-turbo",
    messages=messages,
)

# for acompletion
litellm.aclient_session = httpx.AsyncClient(verify=False)
response = litellm.acompletion(
    model="gpt-3.5-turbo",
    messages=messages,
)

Using OpenAI Proxy with LiteLLM

import os 
import litellm
from litellm import completion

os.environ["OPENAI_API_KEY"] = ""

# set custom api base to your proxy
# either set .env or litellm.api_base
# os.environ["OPENAI_BASE_URL"] = "https://your_host/v1"
litellm.api_base = "https://your_host/v1"


messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion("openai/your-model-name", messages)

If you need to set api_base dynamically, just pass it in completions instead - completions(...,api_base="your-proxy-api-base")

For more check out setting API Base/Keys

Forwarding Org ID for Proxy requests

Forward openai Org ID's from the client to OpenAI with forward_openai_org_id param.

  1. Setup config.yaml
model_list:
  - model_name: "gpt-3.5-turbo"
    litellm_params:
      model: gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

general_settings:
    forward_openai_org_id: true # 👈 KEY CHANGE
  1. Start Proxy
litellm --config config.yaml --detailed_debug

# RUNNING on http://0.0.0.0:4000
  1. Make OpenAI call
from openai import OpenAI
client = OpenAI(
    api_key="sk-1234",
    organization="my-special-org",
    base_url="http://0.0.0.0:4000"
)

client.chat.completions.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello world"}])

In your logs you should see the forwarded org id

LiteLLM:DEBUG: utils.py:255 - Request to litellm:
LiteLLM:DEBUG: utils.py:255 - litellm.acompletion(... organization='my-special-org',)

GPT-5 Pro Special Notes

GPT-5 Pro is OpenAI's most advanced reasoning model with unique characteristics:

# GPT-5 Pro usage example
response = completion(
    model="gpt-5-pro", 
    messages=[{"role": "user", "content": "Solve this complex reasoning problem..."}]
)

Video Generation

LiteLLM supports OpenAI's video generation models including Sora.

For detailed documentation on video generation, see OpenAI Video Generation →