Quickstart | AIOS Docs (original) (raw)
For the complete documentation index, see llms.txt. This page is also available as Markdown.
Quickstart
Be sure to complete the installation instructions before continuing with this guide.
Before starting AIOS, you need to make sure you have installed the LLM backends that you would like to run. Here are the LLM providers for supported backends for AIOS.
claude-3-5-sonnet-20241022
claude-3-5-haiku-20241022
llama-3.2-90b-vision-preview
llama-3.2-11b-vision-preview
llama3-groq-70b-8192-tool-use-preview
llama3-groq-8b-8192-tool-use-preview
Set up configuration file directly (Recommended)
You need API keys for services like OpenAI, Anthropic, Groq and HuggingFace. The simplest way to configure them is to edit the aios/config/config.yaml.
[!TIP] It is important to mention that, we strongly recommend using the
aios/config/config.yamlfile to set up your API keys. This method is straightforward and helps avoid potential sychronization issues with environment variables.
A simple example to set up your API keys in aios/config/config.yaml is shown below:
To obtain these API keys:
- Deepseek API: https://api-docs.deepseek.com/
- OpenAI API: https://platform.openai.com/api-keys
- Google Gemini API: https://makersuite.google.com/app/apikey
- Groq API: https://console.groq.com/keys
- HuggingFace Token: https://huggingface.co/settings/tokens
- Anthropic API: https://console.anthropic.com/keys
Configure LLM Models
You can configure which LLM models to use in the same aios/config/config.yaml file. Here's an example configuration:
Using Ollama Models:
- First, download ollama from https://ollama.com/
- Start the ollama server in a separate terminal:
- Pull your desired models from https://ollama.com/library:
Ollama supports both CPU-only and GPU environments. For more details about ollama usage, visit ollama documentation
Using vLLM Models:
- Start the vLLM server in a separate terminal:
vLLM currently only supports Linux and GPU-enabled environments. If you don't have a compatible environment, please choose other backend options. To enable the tool calling feature of vllm, refer to https://docs.vllm.ai/en/latest/features/tool\_calling.html
Using HuggingFace Models: You can configure HuggingFace models with specific GPU memory allocation:
After you setup the required keys, you can run the following command to launch the AIOS kernel.
And then you can start a client to interact with the AIOS kernel using Terminal or WebUI.
api_keys:
openai: "your-openai-key"
gemini: "your-gemini-key"
groq: "your-groq-key"
anthropic: "your-anthropic-key"
huggingface:
auth_token: "token to authorize specific models for use"
home: "path to store downloaded model weights"llms:
models:
# Ollama Models
- name: "qwen2.5:7b"
backend: "ollama"
hostname: "http://localhost:11434" # Make sure to run ollama server
# vLLM Models
- name: "meta-llama/Llama-3.1-8B-Instruct"
backend: "vllm"
hostname: "http://localhost:8091/v1" # Make sure to run vllm serverollama pull qwen2.5:7b # example modelvllm serve meta-llama/Llama-3.1-8B-Instruct --port 8091- name: "meta-llama/Llama-3.1-8B-Instruct"
backend: "huggingface"
max_gpu_memory: {0: "24GB", 1: "24GB"} # GPU memory allocation
eval_device: "cuda:0" # Device for model evaluationbash runtime/launch_kernel.sh