Snowflake Cortex | liteLLM (original) (raw)

LiteLLM supports all models on the Snowflake Cortex REST API, including models from Anthropic (Claude), OpenAI (GPT), Meta (Llama), Mistral, DeepSeek, and Snowflake.

Description	Snowflake Cortex REST API provides access to leading frontier LLMs through OpenAI-compatible and Anthropic-compatible endpoints. All inference runs within Snowflake's security perimeter.
Provider Route on LiteLLM	snowflake/
Provider Docs	Cortex REST API ↗
API Endpoints	Chat Completions: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/chat/completions Messages: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/messages Legacy: https://{account}.snowflakecomputing.com/api/v2/cortex/inference:complete
Supported OpenAI Endpoints	/chat/completions, /completions, /embeddings

Tip : We support ALL Snowflake Cortex models. Use model=snowflake/<model-name> as a prefix when sending LiteLLM requests.

Authentication

Snowflake Cortex REST API supports three authentication methods.

Programmatic Access Token (PAT) — Recommended

The simplest approach. Generate a PAT in Snowsight under User Menu → My Profile → Programmatic Access Tokens.

import os
from litellm import completion

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-programmatic-access-token>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}],
)

JWT (Key-Pair Authentication)

Generate a JWT from a Snowflake key pair. See Key-pair authentication.

import os
from litellm import completion

os.environ["SNOWFLAKE_JWT"] = "<your-jwt-token>"
os.environ["SNOWFLAKE_ACCOUNT_ID"] = "<orgname>-<account_name>"

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}],
)

Pass credentials as parameters

from litellm import completion

# Using PAT
response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key="pat/<your-pat-token>",
    api_base="https://<account>.snowflakecomputing.com/api/v2/cortex/v1",
)

# Using JWT
response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key="<your-jwt-token>",
    account_id="<orgname>-<account_name>",
)

For all authentication options, see Authenticating to Cortex REST API.

Usage

SDK
PROXY

from litellm import completion
import os

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "What is Snowflake Cortex?"}],
)
print(response.choices[0].message.content)

Supported OpenAI Parameters

temperature, max_tokens, top_p, stream, response_format,
tools, tool_choice

Streaming

SDK
PROXY

from litellm import completion
import os

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Write a haiku about data."}],
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Supported on Claude and select models. LiteLLM automatically transforms OpenAI tool format to Snowflake's tool_spec format.

SDK
PROXY

from litellm import completion
import os, json

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"],
            },
        },
    }
]

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    tools=tools,
    tool_choice="auto",
)

print(response.choices[0].message.tool_calls)

Thinking / Reasoning

Claude 3.7 Sonnet, Claude 4 Opus, and DeepSeek R1 on Cortex support extended thinking. LiteLLM translates reasoning_effort to the provider's thinking parameter.

reasoning_effort	budget_tokens
"low"	1024
"medium"	2048
"high"	4096

from litellm import completion

response = completion(
    model="snowflake/claude-3-7-sonnet",
    messages=[{"role": "user", "content": "Solve: what is 127 * 389?"}],
    reasoning_effort="low",
)
print(response.choices[0].message.content)

Prompt Caching

Snowflake Cortex supports prompt caching to reduce costs:

OpenAI models: Implicit caching for prompts ≥ 1,024 tokens (no code changes needed)
Claude models: Explicit caching via cache_control breakpoints

Cached input tokens are billed at 10% of the regular input rate (90% discount) when ≥ 1,024 tokens are cached.

See Cortex REST API Billing & Cost Analysis for details.

Embeddings

from litellm import embedding
import os

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = embedding(
    model="snowflake/snowflake-arctic-embed-l-v2.0",
    input=["Snowflake Cortex provides LLM inference"],
)
print(response.data[0]["embedding"][:5])

Supported Models

All models are available through the snowflake/ prefix.

Chat Completion Models

Model	litellm model name	Function Calling	Vision	Prompt Caching
Claude Sonnet 4.5	snowflake/claude-sonnet-4-5	✅	✅	✅
Claude Sonnet 4.6	snowflake/claude-sonnet-4-6	✅	✅	✅
Claude 4 Sonnet	snowflake/claude-4-sonnet	✅	✅	✅
Claude 4 Opus	snowflake/claude-4-opus	✅	✅	✅
Claude Haiku 4.5	snowflake/claude-haiku-4-5	✅	✅	✅
Claude 3.7 Sonnet	snowflake/claude-3-7-sonnet	✅	✅	✅
Claude 3.5 Sonnet	snowflake/claude-3-5-sonnet	✅	✅	✅
OpenAI GPT-4.1	snowflake/openai-gpt-4.1	✅	✅	✅
OpenAI GPT-5	snowflake/openai-gpt-5	✅	✅	✅
OpenAI GPT-5 Mini	snowflake/openai-gpt-5-mini	✅
OpenAI GPT-5 Nano	snowflake/openai-gpt-5-nano	✅
DeepSeek R1	snowflake/deepseek-r1
Mistral Large 2	snowflake/mistral-large2	✅
Llama 3.1 8B	snowflake/llama3.1-8b
Llama 3.1 70B	snowflake/llama3.1-70b	✅
Llama 3.1 405B	snowflake/llama3.1-405b	✅
Llama 3.3 70B	snowflake/llama3.3-70b	✅
Llama 4 Maverick	snowflake/llama4-maverick	✅
Snowflake Llama 3.3 70B	snowflake/snowflake-llama-3.3-70b	✅

Embedding Models

Model	litellm model name
Snowflake Arctic Embed L v2.0	snowflake/snowflake-arctic-embed-l-v2.0
Snowflake Arctic Embed M v2.0	snowflake/snowflake-arctic-embed-m-v2.0

Snowflake Cortex | liteLLM (original) (raw)

Authentication​

Programmatic Access Token (PAT) — Recommended​

JWT (Key-Pair Authentication)​

Pass credentials as parameters​

Usage​

Supported OpenAI Parameters​

Streaming​

Thinking / Reasoning​

Prompt Caching​

Embeddings​

Supported Models​

Chat Completion Models​

Embedding Models​