Quickstart | AIOS Docs (original) (raw)

For the complete documentation index, see llms.txt. This page is also available as Markdown.

  1. Getting Started

Quickstart

Be sure to complete the installation instructions before continuing with this guide.

Before starting AIOS, you need to make sure you have installed the LLM backends that you would like to run. Here are the LLM providers for supported backends for AIOS.

claude-3-5-sonnet-20241022

claude-3-5-haiku-20241022

llama-3.2-90b-vision-preview

llama-3.2-11b-vision-preview

llama3-groq-70b-8192-tool-use-preview

llama3-groq-8b-8192-tool-use-preview

Set up configuration file directly (Recommended)

You need API keys for services like OpenAI, Anthropic, Groq and HuggingFace. The simplest way to configure them is to edit the aios/config/config.yaml.

[!TIP] It is important to mention that, we strongly recommend using the aios/config/config.yaml file to set up your API keys. This method is straightforward and helps avoid potential sychronization issues with environment variables.

A simple example to set up your API keys in aios/config/config.yaml is shown below:

To obtain these API keys:

  1. Deepseek API: https://api-docs.deepseek.com/
  2. OpenAI API: https://platform.openai.com/api-keys
  3. Google Gemini API: https://makersuite.google.com/app/apikey
  4. Groq API: https://console.groq.com/keys
  5. HuggingFace Token: https://huggingface.co/settings/tokens
  6. Anthropic API: https://console.anthropic.com/keys

Configure LLM Models

You can configure which LLM models to use in the same aios/config/config.yaml file. Here's an example configuration:

Using Ollama Models:

  1. First, download ollama from https://ollama.com/
  2. Start the ollama server in a separate terminal:
  3. Pull your desired models from https://ollama.com/library:

Ollama supports both CPU-only and GPU environments. For more details about ollama usage, visit ollama documentation

Using vLLM Models:

  1. Start the vLLM server in a separate terminal:

vLLM currently only supports Linux and GPU-enabled environments. If you don't have a compatible environment, please choose other backend options. To enable the tool calling feature of vllm, refer to https://docs.vllm.ai/en/latest/features/tool\_calling.html

Using HuggingFace Models: You can configure HuggingFace models with specific GPU memory allocation:

After you setup the required keys, you can run the following command to launch the AIOS kernel.

And then you can start a client to interact with the AIOS kernel using Terminal or WebUI.

api_keys:
  openai: "your-openai-key"    
  gemini: "your-gemini-key"    
  groq: "your-groq-key"      
  anthropic: "your-anthropic-key" 
  huggingface:
    auth_token: "token to authorize specific models for use"  
    home: "path to store downloaded model weights"
llms:
  models:
    # Ollama Models
    - name: "qwen2.5:7b"
      backend: "ollama"
      hostname: "http://localhost:11434"  # Make sure to run ollama server

    # vLLM Models
    - name: "meta-llama/Llama-3.1-8B-Instruct"
      backend: "vllm"
      hostname: "http://localhost:8091/v1"  # Make sure to run vllm server
ollama pull qwen2.5:7b  # example model
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8091
- name: "meta-llama/Llama-3.1-8B-Instruct"
  backend: "huggingface"
  max_gpu_memory: {0: "24GB", 1: "24GB"}  # GPU memory allocation
  eval_device: "cuda:0"  # Device for model evaluation
bash runtime/launch_kernel.sh