GitHub - meta-llama/llama-stack-client-swift: llama-stack-client-swift brings the inference and agents APIs of Llama Stack to iOS. (original) (raw)

llama-stack-client-swift

Discord

llama-stack-client-swift brings the inference and agents APIs of Llama Stack to iOS.

Compatible with:

Features

iOS Demos

We have several demo apps to help provide reference for how to use the SDK:

Installation

  1. Click "Xcode > File > Add Package Dependencies...".
  2. Add this repo URL at the top right: https://github.com/meta-llama/llama-stack-client-swift, then click Add Package.
  3. Select and add llama-stack-client-swift to your app target.
  4. On the first build: Enable & Trust the OpenAPIGenerator extension when prompted.
  5. The quickest way to try out the demo for remote inference is using Together.ai's Llama Stack distro at https://llama-stack.together.ai - you can skip Step 6 unless you want to build your own distro.Note that Llama 4 is currently only supported by building your own distro from Llama Stack PIP package or main.
  6. (Optional) Set up a remote Llama Stack distributions, assuming you have a Fireworks or Together API key, which you can get easily by clicking the link:
conda create -n llama-stack python=3.10
conda activate llama-stack
pip install --no-cache llama-stack==0.2.2 llama-models==0.2.0 llama-stack-client==0.2.2

Then, either:

llama stack build --template fireworks --image-type conda
export FIREWORKS_API_KEY="<your_fireworks_api_key>"
llama stack run fireworks

or

llama stack build --template together --image-type conda
export TOGETHER_API_KEY="<your_together_api_key>"
llama stack run together

The default port is 8321 for llama stack run and you can specify a different port by adding --port <your_port> to the end of llama stack run fireworks|together.

Replace the RemoteInference url string below with the host IP and port of the remote Llama Stack distro in Step 6:

import LlamaStackClient

let inference = RemoteInference(url: URL(string: "https://llama-stack.together.ai")!)

  1. Build and Run the iOS demo.

Below is an example code snippet to use the Llama Stack inference API. See the iOS Demos above for complete code.

for await chunk in try await inference.chatCompletion( request: Components.Schemas.ChatCompletionRequest( model_id: "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", messages: [ .user( Components.Schemas.UserMessage( role: .user, content: .InterleavedContentItem( .text(Components.Schemas.TextContentItem( _type: .text, text: userInput ) ) ) ) ) ], stream: true) ) { switch (chunk.event.delta) { case .text(let s): message += s.text break case .image(let s): print("> (s)") break case .tool_call(let s): print("> (s)") break } }