Quickstart | AIOS Docs (original) (raw)

For the complete documentation index, see llms.txt. This page is also available as Markdown.

Getting Started

Quickstart

Be sure to complete the installation instructions before continuing with this guide.

Before starting AIOS, you need to make sure you have installed the LLM backends that you would like to run. Here are the LLM providers for supported backends for AIOS.

claude-3-5-sonnet-20241022

claude-3-5-haiku-20241022

llama-3.2-90b-vision-preview

llama-3.2-11b-vision-preview

llama3-groq-70b-8192-tool-use-preview

llama3-groq-8b-8192-tool-use-preview

Set up configuration file directly (Recommended)

You need API keys for services like OpenAI, Anthropic, Groq and HuggingFace. The simplest way to configure them is to edit the aios/config/config.yaml.

[!TIP] It is important to mention that, we strongly recommend using the aios/config/config.yaml file to set up your API keys. This method is straightforward and helps avoid potential sychronization issues with environment variables.

A simple example to set up your API keys in aios/config/config.yaml is shown below:

To obtain these API keys:

Deepseek API: https://api-docs.deepseek.com/
OpenAI API: https://platform.openai.com/api-keys
Google Gemini API: https://makersuite.google.com/app/apikey
Groq API: https://console.groq.com/keys
HuggingFace Token: https://huggingface.co/settings/tokens
Anthropic API: https://console.anthropic.com/keys

Configure LLM Models

You can configure which LLM models to use in the same aios/config/config.yaml file. Here's an example configuration:

Using Ollama Models:

First, download ollama from https://ollama.com/
Start the ollama server in a separate terminal:
Pull your desired models from https://ollama.com/library:

Ollama supports both CPU-only and GPU environments. For more details about ollama usage, visit ollama documentation

Using vLLM Models:

Start the vLLM server in a separate terminal:

vLLM currently only supports Linux and GPU-enabled environments. If you don't have a compatible environment, please choose other backend options. To enable the tool calling feature of vllm, refer to https://docs.vllm.ai/en/latest/features/tool\_calling.html

Using HuggingFace Models: You can configure HuggingFace models with specific GPU memory allocation:

After you setup the required keys, you can run the following command to launch the AIOS kernel.

And then you can start a client to interact with the AIOS kernel using Terminal or WebUI.

Configuration
Launch AIOS

api_keys:
  openai: "your-openai-key"    
  gemini: "your-gemini-key"    
  groq: "your-groq-key"      
  anthropic: "your-anthropic-key" 
  huggingface:
    auth_token: "token to authorize specific models for use"  
    home: "path to store downloaded model weights"

llms:
  models:
    # Ollama Models
    - name: "qwen2.5:7b"
      backend: "ollama"
      hostname: "http://localhost:11434"  # Make sure to run ollama server

    # vLLM Models
    - name: "meta-llama/Llama-3.1-8B-Instruct"
      backend: "vllm"
      hostname: "http://localhost:8091/v1"  # Make sure to run vllm server

ollama pull qwen2.5:7b  # example model

vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8091

- name: "meta-llama/Llama-3.1-8B-Instruct"
  backend: "huggingface"
  max_gpu_memory: {0: "24GB", 1: "24GB"}  # GPU memory allocation
  eval_device: "cuda:0"  # Device for model evaluation

bash runtime/launch_kernel.sh