AWS Bedrock | liteLLM (original) (raw)
ALL Bedrock models (Anthropic, Meta, Deepseek, Mistral, Amazon, etc.) are Supported
| Property | Details |
|---|---|
| Description | Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs). |
| Provider Route on LiteLLM | bedrock/, bedrock/converse/, bedrock/invoke/, bedrock/converse_like/, bedrock/llama/, bedrock/deepseek_r1/, bedrock/qwen3/, bedrock/qwen2/, bedrock/openai/, bedrock/moonshot |
| Provider Doc | Amazon Bedrock ↗ |
| Supported OpenAI Endpoints | /chat/completions, /completions, /embeddings, /images/generations, /v1/realtime |
| Rerank Endpoint | /rerank |
| Pass-through Endpoint | Supported |
LiteLLM requires boto3 to be installed on your system for Bedrock requests
info
For Amazon Nova Models: Bump to v1.53.5+
Authentication
LiteLLM supports API key authentication in addition to traditional boto3 authentication methods. For additional API key details, refer to docs.
Option 1: use the AWS_BEARER_TOKEN_BEDROCK environment variable
export AWS_BEARER_TOKEN_BEDROCK="your-api-key"
Option 2: use the api_key parameter to pass in API key for completion, embedding, image_generation API calls.
- SDK
- PROXY
response = completion(
model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_key="your-api-key"
)
Usage
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
LiteLLM Proxy Usage
Here's how to call Bedrock with the LiteLLM Proxy Server
1. Setup config.yaml
model_list:
- model_name: bedrock-claude-3-5-sonnet
litellm_params:
model: bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: os.environ/AWS_REGION_NAME
All possible auth params:
aws_access_key_id: Optional[str],
aws_secret_access_key: Optional[str],
aws_session_token: Optional[str],
aws_region_name: Optional[str],
aws_session_name: Optional[str],
aws_profile_name: Optional[str],
aws_role_name: Optional[str],
aws_web_identity_token: Optional[str],
aws_bedrock_runtime_endpoint: Optional[str],
api_key: Optional[str],
2. Start the proxy
litellm --config /path/to/config.yaml
3. Test it
- Curl Request
- OpenAI v1.0.0+
- Langchain
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "bedrock-claude-v1",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
Set temperature, top p, etc.
- SDK
- PROXY
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{ "content": "Hello, how are you?","role": "user"}],
temperature=0.7,
top_p=1
)
Pass provider-specific params
If you pass a non-openai param to litellm, we'll assume it's provider-specific and send it as a kwarg in the request body. See more
- SDK
- PROXY
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{ "content": "Hello, how are you?","role": "user"}],
top_k=1 # 👈 PROVIDER-SPECIFIC PARAM
)
Attach metadata to Bedrock requests for logging and cost attribution.
- SDK
- PROXY
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0",
messages=[{"role": "user", "content": "Hello, how are you?"}],
requestMetadata={
"cost_center": "engineering",
"user_id": "user123"
}
)
LiteLLM supports tool calling via Bedrock's Converse and Invoke API's.
- SDK
- PROXY
from litellm import completion
# set env
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]
response = completion(
model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
messages=messages,
tools=tools,
tool_choice="auto",
)
# Add any assertions, here to check response args
print(response)
assert isinstance(response.choices[0].message.tool_calls[0].function.name, str)
assert isinstance(
response.choices[0].message.tool_calls[0].function.arguments, str
)
Usage - Vision
from litellm import completion
# set env
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
def encode_image(image_path):
import base64
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
image_path = "../proxy/cached_logo.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
resp = litellm.completion(
model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Whats in this image?"},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64," + base64_image
},
},
],
}
],
)
print(f"\nResponse: {resp}")
Usage - 'thinking' / 'reasoning content'
This is currently only supported for Anthropic's Claude 3.7 Sonnet + Deepseek R1 + GPT-OSS models.
Works on v1.61.20+.
Returns 2 new fields in message and delta object:
reasoning_content- string - The reasoning content of the responsethinking_blocks- list of objects (Anthropic only) - The thinking blocks of the response
Each object has the following fields:
type- Literal["thinking"] - The type of thinking blockthinking- string - The thinking of the response. Also returned inreasoning_contentsignature- string - A base64 encoded string, returned by Anthropic.
The signature is required by Anthropic on subsequent calls, if 'thinking' content is passed in (only required to use thinking with tool calling). Learn more
- SDK
- PROXY
from litellm import completion
# set env
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
resp = completion(
model="bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=[{"role": "user", "content": "What is the capital of France?"}],
reasoning_effort="low",
)
print(resp)
Expected Response
Same as Anthropic API response.
{
"id": "chatcmpl-c661dfd7-7530-49c9-b0cc-d5018ba4727d",
"created": 1740640366,
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"object": "chat.completion",
"system_fingerprint": null,
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The capital of France is Paris. It's not only the capital city but also the largest city in France, serving as the country's major cultural, economic, and political center.",
"role": "assistant",
"tool_calls": null,
"function_call": null,
"reasoning_content": "The capital of France is Paris. This is a straightforward factual question.",
"thinking_blocks": [
{
"type": "thinking",
"thinking": "The capital of France is Paris. This is a straightforward factual question.",
"signature": "EqoBCkgIARABGAIiQL2UoU0b1OHYi+yCHpBY7U6FQW8/FcoLewocJQPa2HnmLM+NECy50y44F/kD4SULFXi57buI9fAvyBwtyjlOiO0SDE3+r3spdg6PLOo9PBoMma2ku5OTAoR46j9VIjDRlvNmBvff7YW4WI9oU8XagaOBSxLPxElrhyuxppEn7m6bfT40dqBSTDrfiw4FYB4qEPETTI6TA6wtjGAAqmFqKTo="
}
]
}
}
],
"usage": {
"completion_tokens": 64,
"prompt_tokens": 42,
"total_tokens": 106,
"completion_tokens_details": null,
"prompt_tokens_details": null
}
}
Pass thinking to Anthropic models
Same as Anthropic API response.
Usage - Bedrock search citations in /chat/completions
If your tool returns search sources and you want citation metadata in the final assistant response, pass search_results on the role: "tool" message.
Request shape
{
"model": "bedrock-claude-3-7",
"messages": [
{
"role": "user",
"content": "What is XX?"
},
{
"role": "assistant",
"tool_calls": [
{
"id": "tooluse_a4rBqeZNRTKj2lTskvaO4H",
"type": "function",
"function": {
"name": "RAGRequest",
"arguments": "{\"query\":\"What is Apptio?\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "tooluse_a4rBqeZNRTKj2lTskvaO4H",
"content": "XX is a company that makes calls to Bedrock using passthrough APIs via LiteLLM",
"search_results": [
{
"source": "https://www.xx.com/about",
"title": "About XX",
"content": [
{
"text": "XX is a company that makes calls to Bedrock using passthrough APIs via LiteLLM"
}
],
"citations": {
"enabled": true
}
}
]
}
]
}
What you get back
LiteLLM returns regular assistant text in message.content and citation metadata in message.annotations:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "XX is a technology business management company...",
"annotations": [
{
"type": "url_citation",
"url_citation": {
"start_index": 0,
"end_index": 42,
"title": "About XX",
"url": "https://www.xx.com/about"
}
}
]
}
}
]
}
note
If you only send plain tool.content text (without search_results), you will still get a normal answer, but no structured citation annotations.
Usage - Anthropic Beta Features
LiteLLM supports Anthropic's beta features on AWS Bedrock through the anthropic-beta header. This enables access to experimental features like:
- 1M Context Window - Up to 1 million tokens of context (Claude Opus 4.6, Sonnet 4.5, Sonnet 4)
- Computer Use Tools - AI that can interact with computer interfaces
- Token-Efficient Tools - More efficient tool usage patterns
- Extended Output - Up to 128K output tokens
- Enhanced Thinking - Advanced reasoning capabilities
Supported Beta Features
| Beta Feature | Header Value | Compatible Models | Description |
|---|---|---|---|
| 1M Context Window | context-1m-2025-08-07 | Claude Opus 4.6, Sonnet 4.5, Sonnet 4 | Enable 1 million token context window |
| Computer Use (Latest) | computer-use-2025-01-24 | Claude 3.7 Sonnet | Latest computer use tools |
| Computer Use (Legacy) | computer-use-2024-10-22 | Claude 3.5 Sonnet v2 | Computer use tools for Claude 3.5 |
| Token-Efficient Tools | token-efficient-tools-2025-02-19 | Claude 3.7 Sonnet | More efficient tool usage |
| Interleaved Thinking | interleaved-thinking-2025-05-14 | Claude 4 models | Enhanced thinking capabilities |
| Extended Output | output-128k-2025-02-19 | Claude 3.7 Sonnet | Up to 128K output tokens |
| Developer Thinking | dev-full-thinking-2025-05-14 | Claude 4 models | Raw thinking mode for developers |
- SDK
- PROXY
Single Beta Feature
from litellm import completion
import os
# set env
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
# Use 1M context window with Claude Sonnet 4
response = completion(
model="bedrock/anthropic.claude-sonnet-4-20250115-v1:0",
messages=[{"role": "user", "content": "Hello! Testing 1M context window."}],
max_tokens=100,
extra_headers={
"anthropic-beta": "context-1m-2025-08-07" # 👈 Enable 1M context
}
)
Multiple Beta Features
from litellm import completion
# Combine multiple beta features (comma-separated)
response = completion(
model="bedrock/converse/anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[{"role": "user", "content": "Testing multiple beta features"}],
max_tokens=100,
extra_headers={
"anthropic-beta": "computer-use-2024-10-22,context-1m-2025-08-07"
}
)
Computer Use Tools with Beta Features
from litellm import completion
# Computer use tools automatically add computer-use-2024-10-22
# You can add additional beta features
response = completion(
model="bedrock/converse/anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[{"role": "user", "content": "Take a screenshot"}],
tools=[{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1920,
"display_height_px": 1080
}],
extra_headers={
"anthropic-beta": "context-1m-2025-08-07" # Additional beta feature
}
)
info
Beta features may require special access or permissions in your AWS account. Some features are only available in specific AWS regions. Check the AWS Bedrock documentation for availability and access requirements.
Usage - Structured Output / JSON mode
- SDK
- PROXY
from litellm import completion
import os
from pydantic import BaseModel
# set env
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
class EventsList(BaseModel):
events: list[CalendarEvent]
response = completion(
model="bedrock/anthropic.claude-3-7-sonnet-20250219-v1:0", # specify invoke via `bedrock/invoke/anthropic.claude-3-7-sonnet-20250219-v1:0`
response_format=EventsList,
messages=[
{"role": "system", "content": "You are a helpful assistant designed to output JSON."},
{"role": "user", "content": "Who won the world series in 2020?"}
],
)
print(response.choices[0].message.content)
Usage - Latency Optimized Inference
Valid from v1.65.1+
- SDK
- PROXY
from litellm import completion
response = completion(
model="bedrock/anthropic.claude-3-7-sonnet-20250219-v1:0",
messages=[{"role": "user", "content": "What is the capital of France?"}],
performanceConfig={"latency": "optimized"},
)
Usage - Service Tier
Control the processing tier for your Bedrock requests using serviceTier. Valid values are priority, default, or flex.
priority: Higher priority processing with guaranteed capacitydefault: Standard processing tierflex: Cost-optimized processing for batch workloads
Bedrock ServiceTier API Reference
OpenAI-compatible service_tier parameter
LiteLLM also supports the OpenAI-style service_tier parameter, which is automatically translated to Bedrock's native serviceTier format:
| OpenAI service_tier | Bedrock serviceTier |
|---|---|
| "priority" | {"type": "priority"} |
| "default" | {"type": "default"} |
| "flex" | {"type": "flex"} |
| "auto" | {"type": "default"} |
from litellm import completion
# Using OpenAI-style service_tier parameter
response = completion(
model="bedrock/converse/anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "Hello!"}],
service_tier="priority" # Automatically translated to serviceTier={"type": "priority"}
)
Native Bedrock serviceTier parameter
- SDK
- PROXY
from litellm import completion
response = completion(
model="bedrock/converse/qwen.qwen3-235b-a22b-2507-v1:0",
messages=[{"role": "user", "content": "What is the capital of France?"}],
serviceTier={"type": "priority"},
)
Usage - Bedrock Guardrails
Example of using Bedrock Guardrails with LiteLLM
Selective Content Moderation with guarded_text
LiteLLM supports selective content moderation using the guarded_text content type. This allows you to wrap only specific content that should be moderated by Bedrock Guardrails, rather than evaluating the entire conversation.
How it works:
- Content with
type: "guarded_text"gets automatically wrapped inguardrailConverseContentblocks - Only the wrapped content is evaluated by Bedrock Guardrails
- Regular content with
type: "text"bypasses guardrail evaluation
note
If guarded_text is not used, the entire conversation history will be sent to the guardrail for evaluation, which can increase latency and costs.
- LiteLLM SDK
- Proxy on request
- Proxy on config.yaml
from litellm import completion
# set env
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="anthropic.claude-v2",
messages=[
{
"content": "where do i buy coffee from? ",
"role": "user",
}
],
max_tokens=10,
guardrailConfig={
"guardrailIdentifier": "ff6ujrregl1q", # The identifier (ID) for the guardrail.
"guardrailVersion": "DRAFT", # The version of the guardrail.
"trace": "disabled", # The trace behavior for the guardrail. Can either be "disabled" or "enabled"
},
)
# Selective guardrail usage with guarded_text - only specific content is evaluated
response_guard = completion(
model="anthropic.claude-v2",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is the main topic of this legal document?"},
{"type": "guarded_text", "text": "This document contains sensitive legal information that should be moderated by guardrails."}
]
}
],
guardrailConfig={
"guardrailIdentifier": "gr-abc123",
"guardrailVersion": "DRAFT"
}
)
Usage - "Assistant Pre-fill"[](#usage---assistant-pre-fill "Direct link to Usage - "Assistant Pre-fill"")
If you're using Anthropic's Claude with Bedrock, you can "put words in Claude's mouth" by including an assistant role message as the last item in the messages array.
[!IMPORTANT] The returned completion will not include your "pre-fill" text, since it is part of the prompt itself. Make sure to prefix Claude's completion with your pre-fill.
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
messages = [
{"role": "user", "content": "How do you say 'Hello' in German? Return your answer as a JSON object, like this:\n\n{ \"Hello\": \"Hallo\" }"},
{"role": "assistant", "content": "{"},
]
response = completion(model="bedrock/anthropic.claude-v2", messages=messages)
Example prompt sent to Claude
Human: How do you say 'Hello' in German? Return your answer as a JSON object, like this:
{ "Hello": "Hallo" }
Assistant: {
Usage - "System" messages[](#usage---system-messages "Direct link to Usage - "System" messages")
If you're using Anthropic's Claude 2.1 with Bedrock, system role messages are properly formatted for you.
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
messages = [
{"role": "system", "content": "You are a snarky assistant."},
{"role": "user", "content": "How do I boil water?"},
]
response = completion(model="bedrock/anthropic.claude-v2:1", messages=messages)
Example prompt sent to Claude
You are a snarky assistant.
Human: How do I boil water?
Assistant:
Usage - Streaming
import os
from litellm import completion
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="bedrock/anthropic.claude-instant-v1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True
)
for chunk in response:
print(chunk)
Example Streaming Output Chunk
{
"choices": [
{
"finish_reason": null,
"index": 0,
"delta": {
"content": "ase can appeal the case to a higher federal court. If a higher federal court rules in a way that conflicts with a ruling from a lower federal court or conflicts with a ruling from a higher state court, the parties involved in the case can appeal the case to the Supreme Court. In order to appeal a case to the Sup"
}
}
],
"created": null,
"model": "anthropic.claude-instant-v1",
"usage": {
"prompt_tokens": null,
"completion_tokens": null,
"total_tokens": null
}
}
Cross-region inferencing
LiteLLM supports Bedrock cross-region inferencing across all supported bedrock models.
- SDK
- PROXY
from litellm import completion
import os
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
litellm.set_verbose = True # 👈 SEE RAW REQUEST
response = completion(
model="bedrock/us.anthropic.claude-3-haiku-20240307-v1:0",
messages=messages,
max_tokens=10,
temperature=0.1,
)
print("Final Response: {}".format(response))
Set 'converse' / 'invoke' route
info
Supported from LiteLLM Version v1.53.5
LiteLLM defaults to the invoke route. LiteLLM uses the converse route for Bedrock models that support it.
To explicitly set the route, do bedrock/converse/<model> or bedrock/invoke/<model>.
E.g.
- SDK
- PROXY
from litellm import completion
completion(model="bedrock/converse/us.amazon.nova-pro-v1:0")
Alternate user/assistant messages
Use user_continue_message to add a default user message, for cases (e.g. Autogen) where the client might not follow alternating user/assistant messages starting and ending with a user message.
model_list:
- model_name: "bedrock-claude"
litellm_params:
model: "bedrock/anthropic.claude-instant-v1"
user_continue_message: {"role": "user", "content": "Please continue"}
OR
just set litellm.modify_params=True and LiteLLM will automatically handle this with a default user_continue_message.
model_list:
- model_name: "bedrock-claude"
litellm_params:
model: "bedrock/anthropic.claude-instant-v1"
litellm_settings:
modify_params: true
Test it!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "bedrock-claude",
"messages": [{"role": "assistant", "content": "Hey, how's it going?"}]
}'
Usage - PDF / Document Understanding
LiteLLM supports Document Understanding for Bedrock models - AWS Bedrock Docs.
info
LiteLLM supports ALL Bedrock document types -
E.g.: "pdf", "csv", "doc", "docx", "xls", "xlsx", "html", "txt", "md"
You can also pass these as either image_url or base64
url
- SDK
- PROXY
from litellm.utils import supports_pdf_input, completion
# set aws credentials
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
# pdf url
image_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
# Download the file
response = requests.get(url)
file_data = response.content
encoded_file = base64.b64encode(file_data).decode("utf-8")
# model
model = "bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0"
image_content = [
{"type": "text", "text": "What's this file about?"},
{
"type": "file",
"file": {
"file_data": f"data:application/pdf;base64,{encoded_file}", # 👈 PDF
}
},
]
if not supports_pdf_input(model, None):
print("Model does not support image input")
response = completion(
model=model,
messages=[{"role": "user", "content": image_content}],
)
assert response is not None
base64
- SDK
- PROXY
from litellm.utils import supports_pdf_input, completion
# set aws credentials
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
# pdf url
image_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
response = requests.get(url)
file_data = response.content
encoded_file = base64.b64encode(file_data).decode("utf-8")
base64_url = f"data:application/pdf;base64,{encoded_file}"
# model
model = "bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0"
image_content = [
{"type": "text", "text": "What's this file about?"},
{
"type": "image_url",
"image_url": base64_url, # OR {"url": base64_url}
},
]
if not supports_pdf_input(model, None):
print("Model does not support image input")
response = completion(
model=model,
messages=[{"role": "user", "content": image_content}],
)
assert response is not None
OpenAI GPT OSS
| Property | Details |
|---|---|
| Provider Route | bedrock/converse/openai.gpt-oss-20b-1:0, bedrock/converse/openai.gpt-oss-120b-1:0 |
| Provider Documentation | Amazon Bedrock ↗ |
- SDK
- Proxy
GPT OSS SDK Usage
from litellm import completion
import os
# Set AWS credentials
os.environ["AWS_ACCESS_KEY_ID"] = "your-aws-access-key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-aws-secret-key"
os.environ["AWS_REGION_NAME"] = "us-east-1"
# GPT OSS 20B model
response = completion(
model="bedrock/converse/openai.gpt-oss-20b-1:0",
messages=[{"role": "user", "content": "Hello, how are you?"}],
)
print(response.choices[0].message.content)
# GPT OSS 120B model
response = completion(
model="bedrock/converse/openai.gpt-oss-120b-1:0",
messages=[{"role": "user", "content": "Explain machine learning in simple terms"}],
)
print(response.choices[0].message.content)
TwelveLabs Pegasus - Video Understanding
TwelveLabs Pegasus 1.2 is a video understanding model that can analyze and describe video content. LiteLLM supports this model through Bedrock's /invoke endpoint.
| Property | Details |
|---|---|
| Provider Route | bedrock/us.twelvelabs.pegasus-1-2-v1:0, bedrock/eu.twelvelabs.pegasus-1-2-v1:0 |
| Provider Documentation | TwelveLabs Pegasus Docs ↗ |
| Supported Parameters | max_tokens, temperature, response_format |
| Media Input | S3 URI or base64-encoded video |
Supported Features
- Video Analysis: Analyze video content from S3 or base64 input
- Structured Output: Support for JSON schema response format
- S3 Integration: Support for S3 video URLs with bucket owner specification
Usage with S3 Video
- SDK
- Proxy
TwelveLabs Pegasus SDK Usage
from litellm import completion
import os
# Set AWS credentials
os.environ["AWS_ACCESS_KEY_ID"] = "your-aws-access-key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-aws-secret-key"
os.environ["AWS_REGION_NAME"] = "us-east-1"
response = completion(
model="bedrock/us.twelvelabs.pegasus-1-2-v1:0",
messages=[{"role": "user", "content": "Describe what happens in this video."}],
mediaSource={
"s3Location": {
"uri": "s3://your-bucket/video.mp4",
"bucketOwner": "123456789012", # 12-digit AWS account ID
}
},
temperature=0.2
)
print(response.choices[0].message.content)
Usage with Base64 Video
You can also pass video content directly as base64:
Base64 Video Input
from litellm import completion
import base64
# Read video file and encode to base64
with open("video.mp4", "rb") as video_file:
video_base64 = base64.b64encode(video_file.read()).decode("utf-8")
response = completion(
model="bedrock/us.twelvelabs.pegasus-1-2-v1:0",
messages=[{"role": "user", "content": "What is happening in this video?"}],
mediaSource={
"base64String": video_base64
},
temperature=0.2,
)
print(response.choices[0].message.content)
Important Notes
- Response Format: The model supports structured output via
response_formatwith JSON schema
Provisioned throughput models
To use provisioned throughput Bedrock models pass
model=bedrock/<base-model>, examplemodel=bedrock/anthropic.claude-v2. Setmodelto any of the Supported AWS modelsmodel_id=provisioned-model-arn
Completion
import litellm
response = litellm.completion(
model="bedrock/anthropic.claude-instant-v1",
model_id="provisioned-model-arn",
messages=[{"content": "Hello, how are you?", "role": "user"}]
)
Embedding
import litellm
response = litellm.embedding(
model="bedrock/amazon.titan-embed-text-v1",
model_id="provisioned-model-arn",
input=["hi"],
)
Supported AWS Bedrock Models
LiteLLM supports ALL Bedrock models.
Here's an example of using a bedrock model with LiteLLM. For a complete list, refer to the model cost map
| Model Name | Command |
|---|---|
| GPT-OSS 20B | completion(model='bedrock/converse/openai.gpt-oss-20b-1:0', messages=messages) |
| GPT-OSS 120B | completion(model='bedrock/converse/openai.gpt-oss-120b-1:0', messages=messages) |
| Deepseek R1 | completion(model='bedrock/us.deepseek.r1-v1:0', messages=messages) |
| Anthropic Claude Sonnet 4.5 | completion(model='bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0', messages=messages) |
| Anthropic Claude-V3.5 Sonnet | completion(model='bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0', messages=messages) |
| Anthropic Claude-V3 sonnet | completion(model='bedrock/anthropic.claude-3-sonnet-20240229-v1:0', messages=messages) |
| Anthropic Claude-V3 Haiku | completion(model='bedrock/anthropic.claude-3-haiku-20240307-v1:0', messages=messages) |
| Anthropic Claude-V3 Opus | completion(model='bedrock/anthropic.claude-3-opus-20240229-v1:0', messages=messages) |
| Anthropic Claude-V2.1 | completion(model='bedrock/anthropic.claude-v2:1', messages=messages) |
| Anthropic Claude-V2 | completion(model='bedrock/anthropic.claude-v2', messages=messages) |
| Anthropic Claude-Instant V1 | completion(model='bedrock/anthropic.claude-instant-v1', messages=messages) |
| Meta llama3-1-405b | completion(model='bedrock/meta.llama3-1-405b-instruct-v1:0', messages=messages) |
| Meta llama3-1-70b | completion(model='bedrock/meta.llama3-1-70b-instruct-v1:0', messages=messages) |
| Meta llama3-1-8b | completion(model='bedrock/meta.llama3-1-8b-instruct-v1:0', messages=messages) |
| Meta llama3-70b | completion(model='bedrock/meta.llama3-70b-instruct-v1:0', messages=messages) |
| Meta llama3-8b | completion(model='bedrock/meta.llama3-8b-instruct-v1:0', messages=messages) |
| Amazon Titan Lite | completion(model='bedrock/amazon.titan-text-lite-v1', messages=messages) |
| Amazon Titan Express | completion(model='bedrock/amazon.titan-text-express-v1', messages=messages) |
| Cohere Command | completion(model='bedrock/cohere.command-text-v14', messages=messages) |
| AI21 J2-Mid | completion(model='bedrock/ai21.j2-mid-v1', messages=messages) |
| AI21 J2-Ultra | completion(model='bedrock/ai21.j2-ultra-v1', messages=messages) |
| AI21 Jamba-Instruct | completion(model='bedrock/ai21.jamba-instruct-v1:0', messages=messages) |
| Meta Llama 2 Chat 13b | completion(model='bedrock/meta.llama2-13b-chat-v1', messages=messages) |
| Meta Llama 2 Chat 70b | completion(model='bedrock/meta.llama2-70b-chat-v1', messages=messages) |
| Mistral 7B Instruct | completion(model='bedrock/mistral.mistral-7b-instruct-v0:2', messages=messages) |
| Mixtral 8x7B Instruct | completion(model='bedrock/mistral.mixtral-8x7b-instruct-v0:1', messages=messages) |
| TwelveLabs Pegasus 1.2 (US) | completion(model='bedrock/us.twelvelabs.pegasus-1-2-v1:0', messages=messages, mediaSource={...}) |
| TwelveLabs Pegasus 1.2 (EU) | completion(model='bedrock/eu.twelvelabs.pegasus-1-2-v1:0', messages=messages, mediaSource={...}) |
| Moonshot Kimi K2 Thinking | completion(model='bedrock/moonshot.kimi-k2-thinking', messages=messages) or completion(model='bedrock/invoke/moonshot.kimi-k2-thinking', messages=messages) |
Bedrock Embedding
API keys
This can be set as env variables or passed as params to litellm.embedding()
import os
os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
Usage
from litellm import embedding
response = embedding(
model="bedrock/amazon.titan-embed-text-v1",
input=["good morning from litellm"],
)
print(response)
Titan V2 - encoding_format support
from litellm import embedding
# Float format (default)
response = embedding(
model="bedrock/amazon.titan-embed-text-v2:0",
input=["good morning from litellm"],
encoding_format="float" # Returns float array
)
# Binary format
response = embedding(
model="bedrock/amazon.titan-embed-text-v2:0",
input=["good morning from litellm"],
encoding_format="base64" # Returns base64 encoded binary
)
Supported AWS Bedrock Embedding Models
| Model Name | Usage | Supported Additional OpenAI params |
|---|---|---|
| Titan Embeddings V2 | embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input) | dimensions, encoding_format |
| Titan Embeddings - V1 | embedding(model="bedrock/amazon.titan-embed-text-v1", input=input) | here |
| Titan Multimodal Embeddings | embedding(model="bedrock/amazon.titan-embed-image-v1", input=input) | here |
| Cohere Embeddings - English | embedding(model="bedrock/cohere.embed-english-v3", input=input) | here |
| Cohere Embeddings - Multilingual | embedding(model="bedrock/cohere.embed-multilingual-v3", input=input) | here |
Advanced - Drop Unsupported Params
Advanced - Pass model/provider-specific Params
Image Generation
See Bedrock Image Generation for using Stable Diffusion and Amazon Nova Canvas models on Bedrock.
Rerank API
See Bedrock Rerank for using Bedrock's Rerank API in the Cohere /rerank format.
Bedrock Application Inference Profile
Use Bedrock Application Inference Profile to track costs for projects on AWS.
You can either pass it in the model name - model="bedrock/arn:... or as a separate model_id="arn:.. param.
Set via model_id
- SDK
- PROXY
from litellm import completion
import os
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
response = completion(
model="bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0",
messages=[{"role": "user", "content": "Hello, how are you?"}],
model_id="arn:aws:bedrock:eu-central-1:000000000000:application-inference-profile/a0a0a0a0a0a0",
)
print(response)
Boto3 - Authentication
Passing credentials as parameters - Completion()
Pass AWS credentials as parameters to litellm.completion
import os
from litellm import completion
response = completion(
model="bedrock/anthropic.claude-instant-v1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
aws_access_key_id="",
aws_secret_access_key="",
aws_region_name="",
)
This can be used to override existing headers (e.g. Authorization) when calling custom api endpoints
- SDK
- PROXY
import os
import litellm
from litellm import completion
litellm.set_verbose = True # 👈 SEE RAW REQUEST
response = completion(
model="bedrock/anthropic.claude-instant-v1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
aws_access_key_id="",
aws_secret_access_key="",
aws_region_name="",
aws_bedrock_runtime_endpoint="https://my-fake-endpoint.com",
extra_headers={"key": "value"}
)
SSO Login (AWS Profile)
- Set
AWS_PROFILEenvironment variable - Make bedrock completion call
import os
from litellm import completion
response = completion(
model="bedrock/anthropic.claude-instant-v1",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
or pass aws_profile_name:
import os
from litellm import completion
response = completion(
model="bedrock/anthropic.claude-instant-v1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
aws_profile_name="dev-profile",
)
STS (Role-based Auth)
- Set
aws_role_nameandaws_session_name
| LiteLLM Parameter | Boto3 Parameter | Description | Boto3 Documentation |
|---|---|---|---|
| aws_access_key_id | aws_access_key_id | AWS access key associated with an IAM user or role | Credentials |
| aws_secret_access_key | aws_secret_access_key | AWS secret key associated with the access key | Credentials |
| aws_role_name | RoleArn | The Amazon Resource Name (ARN) of the role to assume | AssumeRole API |
| aws_session_name | RoleSessionName | An identifier for the assumed role session | AssumeRole API |
IAM Roles Anywhere (On-Premise / External Workloads)
IAM Roles Anywhere extends IAM roles to workloads outside of AWS (on-premise servers, edge devices, other clouds). It uses the same STS mechanism as regular IAM roles but authenticates via X.509 certificates instead of AWS credentials.
Setup: Configure the AWS Signing Helper as a credential process in ~/.aws/config:
[profile litellm-roles-anywhere]
credential_process = aws_signing_helper credential-process \
--certificate /path/to/certificate.pem \
--private-key /path/to/private-key.pem \
--trust-anchor-arn arn:aws:rolesanywhere:us-east-1:123456789012:trust-anchor/abc123 \
--profile-arn arn:aws:rolesanywhere:us-east-1:123456789012:profile/def456 \
--role-arn arn:aws:iam::123456789012:role/MyBedrockRole
Usage: Reference the profile in LiteLLM:
- SDK
- PROXY
from litellm import completion
response = completion(
model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "Hello!"}],
aws_profile_name="litellm-roles-anywhere",
)
See the IAM Roles Anywhere Getting Started Guide for trust anchor and profile setup.
Make the bedrock completion call
Required AWS IAM Policy for AssumeRole
To use aws_role_name (STS AssumeRole) with LiteLLM, your IAM user or role must have permission to call sts:AssumeRole on the target role. If you see an error like:
An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::...:assumed-role/litellm-ecs-task-role/... is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::...:role/Enterprise/BedrockCrossAccountConsumer
This means the IAM identity running LiteLLM does not have permission to assume the target role. You must update your IAM policy to allow this action.
Example IAM Policy
Replace <TARGET_ROLE_ARN> with the ARN of the role you want to assume (e.g., arn:aws:iam::123456789012:role/Enterprise/BedrockCrossAccountConsumer).
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "<TARGET_ROLE_ARN>"
}
]
}
Note: The target role itself must also trust the calling IAM identity (via its trust policy) for AssumeRole to succeed. See AWS AssumeRole docs for more details.
- SDK
- PROXY
from litellm import completion
response = completion(
model="bedrock/anthropic.claude-instant-v1",
messages=messages,
max_tokens=10,
temperature=0.1,
aws_role_name=aws_role_name,
aws_session_name="my-test-session",
)
If you also need to dynamically set the aws user accessing the role, add the additional args in the completion()/embedding() function
from litellm import completion
response = completion(
model="bedrock/anthropic.claude-instant-v1",
messages=messages,
max_tokens=10,
temperature=0.1,
aws_region_name=aws_region_name,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
aws_role_name=aws_role_name,
aws_session_name="my-test-session",
)
Passing an external BedrockRuntime.Client as a parameter - Completion()
This is a deprecated flow. Boto3 is not async. And boto3.client does not let us make the http call through httpx. Pass in your aws params through the method above 👆. See Auth Code Add new auth flow
warning
Experimental - 2024-Jun-23:aws_access_key_id, aws_secret_access_key, and aws_session_token will be extracted from boto3.client and be passed into the httpx client
Pass an external BedrockRuntime.Client object as a parameter to litellm.completion. Useful when using an AWS credentials profile, SSO session, assumed role session, or if environment variables are not available for auth.
Create a client from session credentials:
import boto3
from litellm import completion
bedrock = boto3.client(
service_name="bedrock-runtime",
region_name="us-east-1",
aws_access_key_id="",
aws_secret_access_key="",
aws_session_token="",
)
response = completion(
model="bedrock/anthropic.claude-instant-v1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
aws_bedrock_client=bedrock,
)
Create a client from AWS profile in ~/.aws/config:
import boto3
from litellm import completion
dev_session = boto3.Session(profile_name="dev-profile")
bedrock = dev_session.client(
service_name="bedrock-runtime",
region_name="us-east-1",
)
response = completion(
model="bedrock/anthropic.claude-instant-v1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
aws_bedrock_client=bedrock,
)
Calling via Internal Proxy (not bedrock url compatible)
Use the bedrock/converse_like/model endpoint to call bedrock converse model via your internal proxy.
- SDK
- LiteLLM Proxy
from litellm import completion
response = completion(
model="bedrock/converse_like/some-model",
messages=[{"role": "user", "content": "What's AWS?"}],
api_key="sk-1234",
api_base="https://some-api-url/models",
extra_headers={"test": "hello world"},
)
Expected Output URL
https://some-api-url/models