Quickstart — Oumi (original) (raw)

📋 Prerequisites
👋 Introduction
💻 Oumi CLI
📚 Training
📊 Evaluation
🧠 Inference
☁️ Launching Jobs in the Cloud
- Launching your first cloud job with Oumi
- 🧭 What’s next?
🔗 Community

Quickstart#

📋 Prerequisites#

Let’s start by installing Oumi. You can easily install the latest stable version of Oumi with the following commands:

pip install oumi

Optional: If you have an Nvidia or AMD GPU, you can install the GPU dependencies

pip install oumi[gpu]

If you need help setting up your environment (python, pip, git, etc), you can find detailed instructions in the Dev Environment Setup guide. The installation guide offers more details on how to install Oumi for your specific environment and use case.

👋 Introduction#

Now that we have Oumi installed, let’s get started with the basics! We’re going to use the oumi command-line interface (CLI) to train, evaluate, and run inference with a model.

We’ll use a small model (SmolLM-135M) so that the examples can run fast on both CPU and GPU. SmolLM is a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset. You can learn more about about them in this blog post.

For a full list of recipes, including larger models like Llama 3.2, you can explore the recipes page.

💻 Oumi CLI#

The general structure of Oumi CLI commands is:

For detailed help on any command, you can use the --help option:

oumi --help # for general help oumi --help # for command-specific help

The available commands are:

train
evaluate
infer
launch
judge

Let’s go through some examples of each command.

📚 Training#

You can quickly start training a model using any of existing recipes or your own custom configs. The following command will start training using the recipe in configs/recipes/smollm/sft/135m/quickstart_train.yaml:

FFT config for SmolLM 135M Instruct.

Usage:

oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml

- Documentation: https://oumi.ai/docs/en/latest/user_guides/train/train.html

- Config class: oumi.core.configs.TrainingConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/training_config.py

- Other training configs: configs//pretraining/, configs//sft/, configs/**/dpo/

model: model_name: "HuggingFaceTB/SmolLM2-135M-Instruct" model_max_length: 2048 torch_dtype_str: "bfloat16" attn_implementation: "sdpa" load_pretrained_weights: True trust_remote_code: True

data: train: datasets: - dataset_name: "yahma/alpaca-cleaned" target_col: "prompt"

training: trainer_type: TRL_SFT save_final_model: True save_steps: 100 max_steps: 10 per_device_train_batch_size: 4 gradient_accumulation_steps: 4

ddp_find_unused_parameters: False optimizer: "adamw_torch" learning_rate: 2.0e-05 compile: False

dataloader_num_workers: "auto" dataloader_prefetch_factor: 32

seed: 192847 use_deterministic: True

logging_steps: 5 log_model_summary: False empty_device_cache_steps: 50 output_dir: "output/smollm135m.fft" include_performance_metrics: True

oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml

Any Oumi command which takes a config path as an argument (train, evaluate, infer, etc.) can override parameters from the command line. See CLI Reference for more details. For example:

oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
--training.max_steps 20
--training.learning_rate 1e-4
--data.train.datasets[0].shuffle true
--training.output_dir output/smollm-135m-sft

To run the same recipe on your own dataset (e.g., in our supported JSON or JSONL formats), you can override the dataset name and path. You can try this functionality out by downloading the alpaca_cleaned dataset manually via the huggingface CLI, then including that local path in your run.

huggingface-cli download yahma/alpaca-cleaned --repo-type dataset --local-dir /path/to/local/dataset

oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
--data.train.datasets "[{dataset_name: text_sft, dataset_path: /path/to/local/dataset}]"
--training.output_dir output/smollm-135m-sft-custom

You can also train on multiple GPUs (make sure to install the GPU dependencies if not already installed).

For example, if you have a machine with 4 GPUs, you can run this command to launch a local distributed training run:

oumi distributed torchrun
-m oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
--training.output_dir output/smollm-135m-sft-dist

You can also use torchrun directly in standalone mode.

torchrun --standalone --nproc-per-node 4 --log-dir ./logs
-m oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
--training.output_dir output/smollm-135m-sft-dist

📊 Evaluation#

To evaluate a trained model:

Quickstart eval config for SmolLM 135M Instruct.

Usage:

oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml

- Documentation: https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html

- Config class: oumi.core.configs.EvaluationConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/evaluation_config.py

- Other eval configs: configs/**/evaluation/

model: model_name: "HuggingFaceTB/SmolLM2-135M-Instruct" model_max_length: 2048 torch_dtype_str: "bfloat16" attn_implementation: "sdpa" load_pretrained_weights: True trust_remote_code: True

generation: batch_size: 4

tasks:

For all available tasks, see https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html

evaluation_backend: lm_harness task_name: mmlu_college_computer_science eval_kwargs: num_fewshot: 5

Using a model downloaded from HuggingFace:

oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml
--model.model_name HuggingFaceTB/SmolLM2-135M-Instruct

Or, with our newly trained model saved on disk:

oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml
--model.model_name output/smollm135m.fft

If you saved your model to a different directory such as output/smollm-135m-sft-dist, you need only change --model.model_name.

To explore the benchmarks that our evaluations support, including HuggingFace leaderboards and AlpacaEval, visit our evaluation guide.

🧠 Inference#

To run inference with a trained model:

Inference config for SmolLM 135M Instruct.

Usage:

oumi infer -i -c configs/recipes/smollm/inference/135m_infer.yaml

- Documentation: https://oumi.ai/docs/en/latest/user_guides/infer/infer.html

- Config class: oumi.core.configs.InferenceConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/inference_config.py

- Other inference configs: configs/**/inference/

model: model_name: "HuggingFaceTB/SmolLM2-135M-Instruct" model_max_length: 2048 torch_dtype_str: "bfloat16" attn_implementation: "sdpa" load_pretrained_weights: True trust_remote_code: True

generation: max_new_tokens: 100 batch_size: 4

engine: NATIVE

Using a model downloaded from HuggingFace:

oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml
--generation.max_new_tokens 40
--generation.temperature 0.7
--interactive

Or, with our newly trained model saved on disk:

oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml
--model.model_name output/smollm135m.fft
--generation.max_new_tokens 40
--generation.temperature 0.7
--interactive

To learn more about running inference locally or remotely (including OpenAI, Google, Anthropic APIs) and leveraging inference engines to parallelize and speed up your jobs, visit our inference guide.

☁️ Launching Jobs in the Cloud#

So far we have been using Oumi locally. But one of the most exciting and unique Oumi features, compared to similar frameworks, is its integrated ability to launch jobs directly to the cloud (GCP, AWS, Azure, etc).

This section of the quickstart is going to be a little different than the others, so please read this next bit carefully before you proceed.

This tutorial uses GCP; you’ll need a GCP account. You can also use other cloud providers, such as AWS, Azure, etc. See running jobs remotely for more details.
In particular, Oumi uses Skypilot, and the recommended way to use SkyPilot and GCP is with a GCP service account
You will need to install Oumi with GCP support: pip install oumi[gcp]. Please note that we recommend setting up a different environment for each cloud provider you wish to use.
Depending on your precise use case, you may also need to install a few other packages from Google

conda install -c conda-forge google-cloud-sdk -y conda install -c conda-forge google-api-python-client -y conda install -c conda-forge google-cloud-storage -y

There are multiple ways to handle credentials with GCP service accounts. We recommend creating a service account key in JSON format, then downloading it to the machine from which you plan to launch the cloud job. After that, you’ll need to run a few more setup commands.

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS gcloud config set project

You can now run sky check to confirm GCP is enabled.

If you get stuck, please refer to our running jobs remotely section, as well as the documentation for GCP and SkyPilot linked above, for more information.

Launching your first cloud job with Oumi#

Once the one-time setup is out of the way, launching a new cloud job with Oumi is very simple.

Job config to tune smollm 135M on 1 GCP node.

Usage:

oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --cluster smollm-135m-fft

- Documentation: https://oumi.ai/docs/en/latest/user_guides/launch/launch.html

- Config class: oumi.core.configs.JobConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/job_config.py

- Other job configs: configs/**/*job.yaml

name: smollm-135m-sft

resources: cloud: gcp accelerators: "A100:1" use_spot: false disk_size: 100 # Disk size in GBs

working_dir: .

envs: OUMI_RUN_NAME: smollm135m.train

https://github.com/huggingface/tokenizers/issues/899#issuecomment-1027739758

TOKENIZERS_PARALLELISM: false

setup: | set -e pip install uv && uv pip install oumi[gpu]

run: | set -e # Exit if any command failed. source ./configs/examples/misc/sky_init.sh

set -x oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml

echo "Training complete!"

oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml

To launch an evaluation job:

Job config to evaluate smollm 135M on 1 GCP node.

Usage:

oumi launch up -c configs/recipes/smollm/evaluation/135m/quickstart_gcp_job.yaml --cluster smollm-135m-eval

- Documentation: https://oumi.ai/docs/en/latest/user_guides/launch/launch.html

- Config class: oumi.core.configs.JobConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/job_config.py

- Other job configs: configs/**/*job.yaml

name: smollm-135m-eval

resources: cloud: gcp accelerators: "A100:1" use_spot: false disk_size: 100 # Disk size in GBs

working_dir: .

envs: OUMI_RUN_NAME: smollm135m.eval

https://github.com/huggingface/tokenizers/issues/899#issuecomment-1027739758

TOKENIZERS_PARALLELISM: false

setup: | set -e pip install uv && uv pip install oumi[gpu,evaluation]

run: | set -e # Exit if any command failed. source ./configs/examples/misc/sky_init.sh

set -x oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml

echo "Evaluation complete!"

oumi launch up -c configs/recipes/smollm/evaluation/135m/quickstart_gcp_job.yaml

After you run one of the above commands, you should see some console output from Oumi which describes how your job is being provisioned and how the cloud installation is proceeding. In particular, your cluster will be assigned a semi-random name such as sky-7fdd-ab183, which you should take note of.

After 15 minutes or so, Oumi should tell you that the run is complete.

If you want to see the logs from your cloud run, you can pull them down to your local machine –

sky logs --sync-down sky-7fdd-ab183

Cloud services can be expensive! Please keep an eye on your costs, and don’t forget to tear down your cluster when you’re done with this tutorial.

This command will destroy your cluster, including all data on those remote machines, so save your logs and artifacts first!

🧭 What’s next?#

Although this example used GCP, Oumi natively supports a wide range of cloud providers. To explore the Cloud providers that we support, visit running jobs remotely.

Quickstart — Oumi (original) (raw)

Contents

Quickstart#

📋 Prerequisites#

Optional: If you have an Nvidia or AMD GPU, you can install the GPU dependencies

👋 Introduction#

💻 Oumi CLI#

📚 Training#

FFT config for SmolLM 135M Instruct.

Usage:

oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml

See Also:

- Documentation: https://oumi.ai/docs/en/latest/user_guides/train/train.html

- Config class: oumi.core.configs.TrainingConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/training_config.py

- Other training configs: configs//pretraining/, configs//sft/, configs/**/dpo/

📊 Evaluation#

Quickstart eval config for SmolLM 135M Instruct.

Usage:

oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml

See Also:

- Documentation: https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html

- Config class: oumi.core.configs.EvaluationConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/evaluation_config.py

- Other eval configs: configs/**/evaluation/

For all available tasks, see https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html

🧠 Inference#

Inference config for SmolLM 135M Instruct.

Usage:

oumi infer -i -c configs/recipes/smollm/inference/135m_infer.yaml

See Also:

- Documentation: https://oumi.ai/docs/en/latest/user_guides/infer/infer.html

- Config class: oumi.core.configs.InferenceConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/inference_config.py

- Other inference configs: configs/**/inference/

☁️ Launching Jobs in the Cloud#

Launching your first cloud job with Oumi#

Job config to tune smollm 135M on 1 GCP node.

Usage:

oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --cluster smollm-135m-fft

See Also:

- Documentation: https://oumi.ai/docs/en/latest/user_guides/launch/launch.html

- Config class: oumi.core.configs.JobConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/job_config.py

- Other job configs: configs/**/*job.yaml

https://github.com/huggingface/tokenizers/issues/899#issuecomment-1027739758

Job config to evaluate smollm 135M on 1 GCP node.

Usage:

oumi launch up -c configs/recipes/smollm/evaluation/135m/quickstart_gcp_job.yaml --cluster smollm-135m-eval

See Also:

- Documentation: https://oumi.ai/docs/en/latest/user_guides/launch/launch.html

- Config class: oumi.core.configs.JobConfig

- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/job_config.py

- Other job configs: configs/**/*job.yaml

https://github.com/huggingface/tokenizers/issues/899#issuecomment-1027739758

🧭 What’s next?#