Mistral AI API | liteLLM (original) (raw)

https://docs.mistral.ai/api/

API Key

# env variable
os.environ['MISTRAL_API_KEY']

Sample Usage

from litellm import completion
import os

os.environ['MISTRAL_API_KEY'] = ""
response = completion(
    model="mistral/mistral-tiny", 
    messages=[
       {"role": "user", "content": "hello from litellm"}
   ],
)
print(response)

Sample Usage - Streaming

from litellm import completion
import os

os.environ['MISTRAL_API_KEY'] = ""
response = completion(
    model="mistral/mistral-tiny", 
    messages=[
       {"role": "user", "content": "hello from litellm"}
   ],
    stream=True
)

for chunk in response:
    print(chunk)

Usage with LiteLLM Proxy

1. Set Mistral Models on config.yaml

model_list:
  - model_name: mistral-small-latest
    litellm_params:
      model: mistral/mistral-small-latest
      api_key: "os.environ/MISTRAL_API_KEY" # ensure you have `MISTRAL_API_KEY` in your .env

2. Start Proxy

litellm --config config.yaml

3. Test it

Curl Request
OpenAI v1.0.0+
Langchain

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "mistral-small-latest",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ]
    }
'

Supported Models

Model Name	Function Call	Reasoning Support
Mistral Small	completion(model="mistral/mistral-small-latest", messages)	No
Mistral Medium	completion(model="mistral/mistral-medium-latest", messages)	No
Mistral Large 2	completion(model="mistral/mistral-large-2407", messages)	No
Mistral Large Latest	completion(model="mistral/mistral-large-latest", messages)	No
Magistral Small	completion(model="mistral/magistral-small-2506", messages)	Yes
Magistral Medium	completion(model="mistral/magistral-medium-2506", messages)	Yes
Mistral 7B	completion(model="mistral/open-mistral-7b", messages)	No
Mixtral 8x7B	completion(model="mistral/open-mixtral-8x7b", messages)	No
Mixtral 8x22B	completion(model="mistral/open-mixtral-8x22b", messages)	No
Codestral	completion(model="mistral/codestral-latest", messages)	No
Mistral NeMo	completion(model="mistral/open-mistral-nemo", messages)	No
Mistral NeMo 2407	completion(model="mistral/open-mistral-nemo-2407", messages)	No
Codestral Mamba	completion(model="mistral/open-codestral-mamba", messages)	No
Codestral Mamba	completion(model="mistral/codestral-mamba-latest"", messages)	No

Function Calling

from litellm import completion

# set env
os.environ["MISTRAL_API_KEY"] = "your-api-key"

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]

response = completion(
    model="mistral/mistral-large-latest",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)
# Add any assertions, here to check response args
print(response)
assert isinstance(response.choices[0].message.tool_calls[0].function.name, str)
assert isinstance(
    response.choices[0].message.tool_calls[0].function.arguments, str
)

Reasoning

Mistral does not directly support reasoning, instead it recommends a specific system prompt to use with their magistral models. By setting the reasoning_effort parameter, LiteLLM will prepend the system prompt to the request.

If an existing system message is provided, LiteLLM will send both as a list of system messages (you can verify this by enabling litellm._turn_on_debug()).

Supported Models

Model Name	Function Call
Magistral Small	completion(model="mistral/magistral-small-2506", messages)
Magistral Medium	completion(model="mistral/magistral-medium-2506", messages)

Using Reasoning Effort

The reasoning_effort parameter controls how much effort the model puts into reasoning. When used with magistral models.

from litellm import completion
import os

os.environ['MISTRAL_API_KEY'] = "your-api-key"

response = completion(
    model="mistral/magistral-medium-2506",
    messages=[
        {"role": "user", "content": "What is 15 multiplied by 7?"}
    ],
    reasoning_effort="medium"  # Options: "low", "medium", "high"
)

print(response)

Example with System Message

If you already have a system message, LiteLLM will prepend the reasoning instructions:

response = completion(
    model="mistral/magistral-medium-2506",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor."},
        {"role": "user", "content": "Explain how to solve quadratic equations."}
    ],
    reasoning_effort="high"
)

# The system message becomes:
# "When solving problems, think step-by-step in <think> tags before providing your final answer...
#  
#  You are a helpful math tutor."

Usage with LiteLLM Proxy

You can also use reasoning capabilities through the LiteLLM proxy:

Curl Request
OpenAI v1.0.0+

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
      "model": "magistral-medium-2506",
      "messages": [
        {
          "role": "user",
          "content": "What is the square root of 144? Show your reasoning."
        }
      ],
      "reasoning_effort": "medium"
    }'

Important Notes

Model Compatibility: Reasoning parameters only work with magistral models
Backward Compatibility: Non-magistral models will ignore reasoning parameters and work normally

Audio Transcription

Use Mistral's Voxtral models for audio transcription via litellm.transcription().

SDK Usage

from litellm import transcription
import os

os.environ["MISTRAL_API_KEY"] = ""

audio_file = open("path/to/audio.wav", "rb")

response = transcription(
    model="mistral/voxtral-mini-latest",
    file=audio_file,
)

print(response.text)

With Optional Parameters

response = transcription(
    model="mistral/voxtral-mini-latest",
    file=audio_file,
    language="en",
    temperature=0.0,
    response_format="json",
)

Mistral-Specific Parameters

Mistral supports additional parameters beyond the OpenAI-compatible ones:

Parameter	Type	Description
diarize	bool	Enable speaker diarization

response = transcription(
    model="mistral/voxtral-mini-latest",
    file=audio_file,
    diarize=True,
)

Usage with LiteLLM Proxy

model_list:
  - model_name: voxtral
    litellm_params:
      model: mistral/voxtral-mini-latest
      api_key: os.environ/MISTRAL_API_KEY
    model_info:
      mode: audio_transcription

litellm --config /path/to/config.yaml

curl --location 'http://0.0.0.0:4000/v1/audio/transcriptions' \
--header 'Authorization: Bearer sk-1234' \
--form 'file=@"audio.wav"' \
--form 'model="voxtral"'

Sample Usage - Embedding

from litellm import embedding
import os

os.environ['MISTRAL_API_KEY'] = ""
response = embedding(
    model="mistral/mistral-embed",
    input=["good morning from litellm"],
)
print(response)

Supported Models

All models listed here https://docs.mistral.ai/platform/endpoints are supported

Model Name	Function Call
Mistral Embeddings	embedding(model="mistral/mistral-embed", input)

Mistral AI API | liteLLM (original) (raw)

API Key​

Sample Usage​

Sample Usage - Streaming​

Usage with LiteLLM Proxy​

1. Set Mistral Models on config.yaml​

2. Start Proxy​

3. Test it​

Supported Models​

Function Calling​

Reasoning​

Supported Models​

Using Reasoning Effort​

Example with System Message​

Usage with LiteLLM Proxy​

Important Notes​

Audio Transcription​

SDK Usage​

With Optional Parameters​

Mistral-Specific Parameters​

Usage with LiteLLM Proxy​

Sample Usage - Embedding​

Supported Models​

API Key

Sample Usage

Sample Usage - Streaming

Usage with LiteLLM Proxy

1. Set Mistral Models on config.yaml

2. Start Proxy

3. Test it

Supported Models

Function Calling

Reasoning

Supported Models

Using Reasoning Effort

Example with System Message

Usage with LiteLLM Proxy

Important Notes

Audio Transcription

SDK Usage

With Optional Parameters

Mistral-Specific Parameters

Usage with LiteLLM Proxy

Sample Usage - Embedding

Supported Models