Snowflake Cortex | liteLLM (original) (raw)

LiteLLM supports all models on the Snowflake Cortex REST API, including models from Anthropic (Claude), OpenAI (GPT), Meta (Llama), Mistral, DeepSeek, and Snowflake.

Description Snowflake Cortex REST API provides access to leading frontier LLMs through OpenAI-compatible and Anthropic-compatible endpoints. All inference runs within Snowflake's security perimeter.
Provider Route on LiteLLM snowflake/
Provider Docs Cortex REST API ↗
API Endpoints Chat Completions: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/chat/completions Messages: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/messages Legacy: https://{account}.snowflakecomputing.com/api/v2/cortex/inference:complete
Supported OpenAI Endpoints /chat/completions, /completions, /embeddings

Tip : We support ALL Snowflake Cortex models. Use model=snowflake/<model-name> as a prefix when sending LiteLLM requests.

Authentication

Snowflake Cortex REST API supports three authentication methods.

The simplest approach. Generate a PAT in Snowsight under User Menu → My Profile → Programmatic Access Tokens.

import os
from litellm import completion

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-programmatic-access-token>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}],
)

JWT (Key-Pair Authentication)

Generate a JWT from a Snowflake key pair. See Key-pair authentication.

import os
from litellm import completion

os.environ["SNOWFLAKE_JWT"] = "<your-jwt-token>"
os.environ["SNOWFLAKE_ACCOUNT_ID"] = "<orgname>-<account_name>"

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}],
)

Pass credentials as parameters

from litellm import completion

# Using PAT
response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key="pat/<your-pat-token>",
    api_base="https://<account>.snowflakecomputing.com/api/v2/cortex/v1",
)

# Using JWT
response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key="<your-jwt-token>",
    account_id="<orgname>-<account_name>",
)

For all authentication options, see Authenticating to Cortex REST API.

Usage

from litellm import completion
import os

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "What is Snowflake Cortex?"}],
)
print(response.choices[0].message.content)

Supported OpenAI Parameters

temperature, max_tokens, top_p, stream, response_format,
tools, tool_choice

Streaming

from litellm import completion
import os

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Write a haiku about data."}],
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Supported on Claude and select models. LiteLLM automatically transforms OpenAI tool format to Snowflake's tool_spec format.

from litellm import completion
import os, json

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"],
            },
        },
    }
]

response = completion(
    model="snowflake/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    tools=tools,
    tool_choice="auto",
)

print(response.choices[0].message.tool_calls)

Thinking / Reasoning

Claude 3.7 Sonnet, Claude 4 Opus, and DeepSeek R1 on Cortex support extended thinking. LiteLLM translates reasoning_effort to the provider's thinking parameter.

reasoning_effort budget_tokens
"low" 1024
"medium" 2048
"high" 4096
from litellm import completion

response = completion(
    model="snowflake/claude-3-7-sonnet",
    messages=[{"role": "user", "content": "Solve: what is 127 * 389?"}],
    reasoning_effort="low",
)
print(response.choices[0].message.content)

Prompt Caching

Snowflake Cortex supports prompt caching to reduce costs:

Cached input tokens are billed at 10% of the regular input rate (90% discount) when ≥ 1,024 tokens are cached.

See Cortex REST API Billing & Cost Analysis for details.

Embeddings

from litellm import embedding
import os

os.environ["SNOWFLAKE_API_KEY"] = "pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"] = "https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = embedding(
    model="snowflake/snowflake-arctic-embed-l-v2.0",
    input=["Snowflake Cortex provides LLM inference"],
)
print(response.data[0]["embedding"][:5])

Supported Models

All models are available through the snowflake/ prefix.

Chat Completion Models

Model litellm model name Function Calling Vision Prompt Caching
Claude Sonnet 4.5 snowflake/claude-sonnet-4-5
Claude Sonnet 4.6 snowflake/claude-sonnet-4-6
Claude 4 Sonnet snowflake/claude-4-sonnet
Claude 4 Opus snowflake/claude-4-opus
Claude Haiku 4.5 snowflake/claude-haiku-4-5
Claude 3.7 Sonnet snowflake/claude-3-7-sonnet
Claude 3.5 Sonnet snowflake/claude-3-5-sonnet
OpenAI GPT-4.1 snowflake/openai-gpt-4.1
OpenAI GPT-5 snowflake/openai-gpt-5
OpenAI GPT-5 Mini snowflake/openai-gpt-5-mini
OpenAI GPT-5 Nano snowflake/openai-gpt-5-nano
DeepSeek R1 snowflake/deepseek-r1
Mistral Large 2 snowflake/mistral-large2
Llama 3.1 8B snowflake/llama3.1-8b
Llama 3.1 70B snowflake/llama3.1-70b
Llama 3.1 405B snowflake/llama3.1-405b
Llama 3.3 70B snowflake/llama3.3-70b
Llama 4 Maverick snowflake/llama4-maverick
Snowflake Llama 3.3 70B snowflake/snowflake-llama-3.3-70b

Embedding Models

Model litellm model name
Snowflake Arctic Embed L v2.0 snowflake/snowflake-arctic-embed-l-v2.0
Snowflake Arctic Embed M v2.0 snowflake/snowflake-arctic-embed-m-v2.0