IBM watsonx.ai | liteLLM (original) (raw)

LiteLLM supports all IBM watsonx.ai foundational models and embeddings.

Environment Variables

os.environ["WATSONX_URL"] = ""  # (required) Base URL of your WatsonX instance
# (required) either one of the following:
os.environ["WATSONX_APIKEY"] = "" # IBM cloud API key
os.environ["WATSONX_TOKEN"] = "" # IAM auth token
# optional - can also be passed as params to completion() or embedding()
os.environ["WATSONX_PROJECT_ID"] = "" # Project ID of your WatsonX instance
os.environ["WATSONX_DEPLOYMENT_SPACE_ID"] = "" # ID of your deployment space to use deployed models
os.environ["WATSONX_ZENAPIKEY"] = "" # Zen API key (use for long-term api token)

See here for more information on how to get an access token to authenticate to watsonx.ai.

Usage

Open In Colab

Chat Completion

import os
from litellm import completion

os.environ["WATSONX_URL"] = ""
os.environ["WATSONX_APIKEY"] = ""

response = completion(
  model="watsonx/meta-llama/llama-3-1-8b-instruct",
  messages=[{ "content": "what is your favorite colour?","role": "user"}],
  project_id="<my-project-id>"
)

Usage - Streaming

Streaming

import os
from litellm import completion

os.environ["WATSONX_URL"] = ""
os.environ["WATSONX_APIKEY"] = ""
os.environ["WATSONX_PROJECT_ID"] = ""

response = completion(
  model="watsonx/meta-llama/llama-3-1-8b-instruct",
  messages=[{ "content": "what is your favorite colour?","role": "user"}],
  stream=True
)
for chunk in response:
  print(chunk)

Usage - Models in deployment spaces

Models deployed to a deployment space (e.g.: tuned models) can be called using the deployment/<deployment_id> format.

Deployment Space

import litellm

response = litellm.completion(
    model="watsonx/deployment/<deployment_id>",
    messages=[{"content": "Hello, how are you?", "role": "user"}],
    space_id="<deployment_space_id>"
)

Usage - Embeddings

Embeddings

from litellm import embedding

response = embedding(
    model="watsonx/ibm/slate-30m-english-rtrvr",
    input=["What is the capital of France?"],
    project_id="<my-project-id>"
)

LiteLLM Proxy Usage

1. Save keys in your environment

export WATSONX_URL=""
export WATSONX_APIKEY=""
export WATSONX_PROJECT_ID=""

2. Start the proxy

$ litellm --model watsonx/meta-llama/llama-3-8b-instruct

3. Test it

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
      "model": "llama-3-8b",
      "messages": [
        {
          "role": "user",
          "content": "what is your favorite colour?"
        }
      ]
    }'

Supported Models

Model Name Command
Llama 3.1 8B Instruct completion(model="watsonx/meta-llama/llama-3-1-8b-instruct", messages=messages)
Llama 2 70B Chat completion(model="watsonx/meta-llama/llama-2-70b-chat", messages=messages)
Granite 13B Chat V2 completion(model="watsonx/ibm/granite-13b-chat-v2", messages=messages)
Mixtral 8X7B Instruct completion(model="watsonx/ibm-mistralai/mixtral-8x7b-instruct-v01-q", messages=messages)

For all available models, see watsonx.ai documentation.

Supported Embedding Models

Model Name Function Call
Slate 30m embedding(model="watsonx/ibm/slate-30m-english-rtrvr", input=input)
Slate 125m embedding(model="watsonx/ibm/slate-125m-english-rtrvr", input=input)

For all available embedding models, see watsonx.ai embedding documentation.

Advanced

Using Zen API Key

You can use a Zen API key for long-term authentication instead of generating IAM tokens. Pass it either as an environment variable or as a parameter:

import os
from litellm import completion

# Option 1: Set as environment variable
os.environ["WATSONX_ZENAPIKEY"] = "your-zen-api-key"

response = completion(
    model="watsonx/ibm/granite-13b-chat-v2",
    messages=[{"content": "What is your favorite color?", "role": "user"}],
    project_id="your-project-id"
)

# Option 2: Pass as parameter
response = completion(
    model="watsonx/ibm/granite-13b-chat-v2",
    messages=[{"content": "What is your favorite color?", "role": "user"}],
    zen_api_key="your-zen-api-key",
    project_id="your-project-id"
)

Using with LiteLLM Proxy via OpenAI client:

import openai

client = openai.OpenAI(
    api_key="sk-1234",  # LiteLLM proxy key
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="watsonx/ibm/granite-3-3-8b-instruct",
    messages=[{"role": "user", "content": "What is your favorite color?"}],
    max_tokens=2048,
    extra_body={
        "project_id": "your-project-id",
        "zen_api_key": "your-zen-api-key"
    }
)

See IBM documentation for more information on generating Zen API keys.