Configuring a “Stack” — llama-stack documentation (original) (raw)
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
version: 2 conda_env: ollama apis:
- agents
- inference
- vector_io
- safety
- telemetry
providers:
inference:
- provider_id: ollama provider_type: remote::ollama config: url: ${env.OLLAMA_URL:http://localhost:11434} vector_io:
- provider_id: faiss provider_type: inline::faiss config: kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/faiss_store.db
safety:
- provider_id: llama-guard provider_type: inline::llama-guard config: {} agents:
- provider_id: meta-reference provider_type: inline::meta-reference config: persistence_store: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/agents_store.db
telemetry:
- provider_id: meta-reference provider_type: inline::meta-reference config: {} metadata_store: namespace: null type: sqlite db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/registry.db models:
- metadata: {} model_id: ${env.INFERENCE_MODEL} provider_id: ollama provider_model_id: null shields: [] server: port: 8321 auth: provider_type: "kubernetes" config: api_server_url: "https://kubernetes.default.svc" ca_cert_path: "/path/to/ca.crt"
Let’s break this down into the different sections. The first section specifies the set of APIs that the stack server will serve:
apis:
- agents
- inference
- memory
- safety
- telemetry
Providers
Next up is the most critical part: the set of providers that the stack will use to serve the above APIs. Consider the inference
API:
providers: inference:
provider_id is a string you can choose freely
- provider_id: ollama
provider_type is a string that specifies the type of provider.
provider_type: remote::ollama in this case, the provider for inference is ollama and it is run remotely (outside of the distribution)
config is a dictionary that contains the configuration for the provider.
config: url: ${env.OLLAMA_URL: in this case, the configuration is the url of the ollama serverhttp://localhost:11434}
A few things to note:
- A provider instance is identified with an (id, type, configuration) triplet.
- The id is a string you can choose freely.
- You can instantiate any number of provider instances of the same type.
- The configuration dictionary is provider-specific.
- Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server (via docker or via
llama stack run
), you can specify--env OLLAMA_URL=http://my-server:11434
to override the default value.
Resources
Finally, let’s look at the models
section:
models:
- metadata: {} model_id: ${env.INFERENCE_MODEL} provider_id: ollama provider_model_id: null
A Model is an instance of a “Resource” (see Concepts) and is associated with a specific inference provider (in this case, the provider with identifier ollama
). This is an instance of a “pre-registered” model. While we always encourage the clients to always register models before using them, some Stack servers may come up a list of “already known and available” models.
What’s with the provider_model_id
field? This is an identifier for the model inside the provider’s model catalog. Contrast it with model_id
which is the identifier for the same model for Llama Stack’s purposes. For example, you may want to name “llama3.2:vision-11b” as “image_captioning_model” when you use it in your Stack interactions. When omitted, the server will set provider_model_id
to be the same as model_id
.
Server Configuration
The server
section configures the HTTP server that serves the Llama Stack APIs:
server: port: 8321 # Port to listen on (default: 8321) tls_certfile: "/path/to/cert.pem" # Optional: Path to TLS certificate for HTTPS tls_keyfile: "/path/to/key.pem" # Optional: Path to TLS key for HTTPS auth: # Optional: Authentication configuration provider_type: "kubernetes" # Type of auth provider config: # Provider-specific configuration api_server_url: "https://kubernetes.default.svc" ca_cert_path: "/path/to/ca.crt" # Optional: Path to CA certificate
Authentication Configuration
The auth
section configures authentication for the server. When configured, all API requests must include a valid Bearer token in the Authorization header:
Authorization: Bearer
The server supports multiple authentication providers:
Kubernetes Provider
The Kubernetes cluster must be configured to use a service account for authentication.
kubectl create namespace llama-stack kubectl create serviceaccount llama-stack-auth -n llama-stack kubectl create rolebinding llama-stack-auth-rolebinding --clusterrole=admin --serviceaccount=llama-stack:llama-stack-auth -n llama-stack kubectl create token llama-stack-auth -n llama-stack > llama-stack-auth-token
Validates tokens against the Kubernetes API server:
server: auth: provider_type: "kubernetes" config: api_server_url: "https://kubernetes.default.svc" # URL of the Kubernetes API server ca_cert_path: "/path/to/ca.crt" # Optional: Path to CA certificate
The provider extracts user information from the JWT token:
- Username from the
sub
claim becomes a role - Kubernetes groups become teams
You can easily validate a request by running:
curl -s -L -H "Authorization: Bearer $(cat llama-stack-auth-token)" http://127.0.0.1:8321/v1/providers
Custom Provider
Validates tokens against a custom authentication endpoint:
server: auth: provider_type: "custom" config: endpoint: "https://auth.example.com/validate" # URL of the auth endpoint
The custom endpoint receives a POST request with:
{ "api_key": "", "request": { "path": "/api/v1/endpoint", "headers": { "content-type": "application/json", "user-agent": "curl/7.64.1" }, "params": { "key": ["value"] } } }
And must respond with:
{ "access_attributes": { "roles": ["admin", "user"], "teams": ["ml-team", "nlp-team"], "projects": ["llama-3", "project-x"], "namespaces": ["research"] }, "message": "Authentication successful" }
If no access attributes are returned, the token is used as a namespace.
Extending to handle Safety
Configuring Safety can be a little involved so it is instructive to go through an example.
The Safety API works with the associated Resource called a Shield
. Providers can support various kinds of Shields. Good examples include the Llama Guard system-safety models, or Bedrock Guardrails.
To configure a Bedrock Shield, you would need to add:
- A Safety API provider instance with type
remote::bedrock
- A Shield resource served by this provider.
... providers: safety:
- provider_id: bedrock provider_type: remote::bedrock config: aws_access_key_id: ${env.AWS_ACCESS_KEY_ID} aws_secret_access_key: ${env.AWS_SECRET_ACCESS_KEY} ... shields:
- provider_id: bedrock params: guardrailVersion: ${env.GUARDRAIL_VERSION} provider_shield_id: ${env.GUARDRAIL_ID} ...
The situation is more involved if the Shield needs Inference of an associated model. This is the case with Llama Guard. In that case, you would need to add:
- A Safety API provider instance with type
inline::llama-guard
- An Inference API provider instance for serving the model.
- A Model resource associated with this provider.
- A Shield resource served by the Safety provider.
The yaml configuration for this setup, assuming you were using vLLM as your inference server, would look like:
... providers: safety:
- provider_id: llama-guard provider_type: inline::llama-guard config: {} inference:
this vLLM server serves the "normal" inference model (e.g., llama3.2:3b)
- provider_id: vllm-0 provider_type: remote::vllm config: url: ${env.VLLM_URL:http://localhost:8000}
this vLLM server serves the llama-guard model (e.g., llama-guard:3b)
- provider_id: vllm-1 provider_type: remote::vllm config: url: ${env.SAFETY_VLLM_URL:http://localhost:8001} ... models:
- metadata: {} model_id: ${env.INFERENCE_MODEL} provider_id: vllm-0 provider_model_id: null
- metadata: {} model_id: ${env.SAFETY_MODEL} provider_id: vllm-1 provider_model_id: null shields:
- provider_id: llama-guard shield_id: ${env.SAFETY_MODEL} # Llama Guard shields are identified by the corresponding LlamaGuard model provider_shield_id: null ...