Quickstart β Oumi (original) (raw)
Contents
- π Prerequisites
- π Introduction
- π» Oumi CLI
- π Training
- π Evaluation
- π§ Inference
- βοΈ Launching Jobs in the Cloud
- π Community
Quickstart#
π Prerequisites#
Letβs start by installing Oumi. You can easily install the latest stable version of Oumi with the following commands:
pip install oumi
Optional: If you have an Nvidia or AMD GPU, you can install the GPU dependencies
pip install oumi[gpu]
If you need help setting up your environment (python, pip, git, etc), you can find detailed instructions in the Dev Environment Setup guide. The installation guide offers more details on how to install Oumi for your specific environment and use case.
π Introduction#
Now that we have Oumi installed, letβs get started with the basics! Weβre going to use the oumi
command-line interface (CLI) to train, evaluate, and run inference with a model.
Weβll use a small model (SmolLM-135M
) so that the examples can run fast on both CPU and GPU. SmolLM
is a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset. You can learn more about about them in this blog post.
For a full list of recipes, including larger models like Llama 3.2, you can explore the recipes page.
π» Oumi CLI#
The general structure of Oumi CLI commands is:
For detailed help on any command, you can use the --help
option:
oumi --help # for general help oumi --help # for command-specific help
The available commands are:
train
evaluate
infer
launch
judge
Letβs go through some examples of each command.
π Training#
You can quickly start training a model using any of existing recipes or your own custom configs. The following command will start training using the recipe in configs/recipes/smollm/sft/135m/quickstart_train.yaml
:
FFT config for SmolLM 135M Instruct.
Usage:
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
See Also:
- Documentation: https://oumi.ai/docs/en/latest/user_guides/train/train.html
- Config class: oumi.core.configs.TrainingConfig
- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/training_config.py
- Other training configs: configs//pretraining/, configs//sft/, configs/**/dpo/
model: model_name: "HuggingFaceTB/SmolLM2-135M-Instruct" model_max_length: 2048 torch_dtype_str: "bfloat16" attn_implementation: "sdpa" load_pretrained_weights: True trust_remote_code: True
data: train: datasets: - dataset_name: "yahma/alpaca-cleaned" target_col: "prompt"
training: trainer_type: TRL_SFT save_final_model: True save_steps: 100 max_steps: 10 per_device_train_batch_size: 4 gradient_accumulation_steps: 4
ddp_find_unused_parameters: False optimizer: "adamw_torch" learning_rate: 2.0e-05 compile: False
dataloader_num_workers: "auto" dataloader_prefetch_factor: 32
seed: 192847 use_deterministic: True
logging_steps: 5 log_model_summary: False empty_device_cache_steps: 50 output_dir: "output/smollm135m.fft" include_performance_metrics: True
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
Any Oumi command which takes a config path as an argument (train
, evaluate
, infer
, etc.) can override parameters from the command line. See CLI Reference for more details. For example:
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
--training.max_steps 20
--training.learning_rate 1e-4
--data.train.datasets[0].shuffle true
--training.output_dir output/smollm-135m-sft
To run the same recipe on your own dataset (e.g., in our supported JSON or JSONL formats), you can override the dataset name and path. You can try this functionality out by downloading the alpaca_cleaned
dataset manually via the huggingface CLI, then including that local path in your run.
huggingface-cli download yahma/alpaca-cleaned --repo-type dataset --local-dir /path/to/local/dataset
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
--data.train.datasets "[{dataset_name: text_sft, dataset_path: /path/to/local/dataset}]"
--training.output_dir output/smollm-135m-sft-custom
You can also train on multiple GPUs (make sure to install the GPU dependencies if not already installed).
For example, if you have a machine with 4 GPUs, you can run this command to launch a local distributed training run:
oumi distributed torchrun
-m oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
--training.output_dir output/smollm-135m-sft-dist
You can also use torchrun directly in standalone mode.
torchrun --standalone --nproc-per-node 4 --log-dir ./logs
-m oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
--training.output_dir output/smollm-135m-sft-dist
π Evaluation#
To evaluate a trained model:
Quickstart eval config for SmolLM 135M Instruct.
Usage:
oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml
See Also:
- Documentation: https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html
- Config class: oumi.core.configs.EvaluationConfig
- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/evaluation_config.py
- Other eval configs: configs/**/evaluation/
model: model_name: "HuggingFaceTB/SmolLM2-135M-Instruct" model_max_length: 2048 torch_dtype_str: "bfloat16" attn_implementation: "sdpa" load_pretrained_weights: True trust_remote_code: True
generation: batch_size: 4
tasks:
For all available tasks, see https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html
- evaluation_backend: lm_harness task_name: mmlu_college_computer_science eval_kwargs: num_fewshot: 5
Using a model downloaded from HuggingFace:
oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml
--model.model_name HuggingFaceTB/SmolLM2-135M-Instruct
Or, with our newly trained model saved on disk:
oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml
--model.model_name output/smollm135m.fft
If you saved your model to a different directory such as output/smollm-135m-sft-dist
, you need only change --model.model_name
.
To explore the benchmarks that our evaluations support, including HuggingFace leaderboards and AlpacaEval, visit our evaluation guide.
π§ Inference#
To run inference with a trained model:
Inference config for SmolLM 135M Instruct.
Usage:
oumi infer -i -c configs/recipes/smollm/inference/135m_infer.yaml
See Also:
- Documentation: https://oumi.ai/docs/en/latest/user_guides/infer/infer.html
- Config class: oumi.core.configs.InferenceConfig
- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/inference_config.py
- Other inference configs: configs/**/inference/
model: model_name: "HuggingFaceTB/SmolLM2-135M-Instruct" model_max_length: 2048 torch_dtype_str: "bfloat16" attn_implementation: "sdpa" load_pretrained_weights: True trust_remote_code: True
generation: max_new_tokens: 100 batch_size: 4
engine: NATIVE
Using a model downloaded from HuggingFace:
oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml
--generation.max_new_tokens 40
--generation.temperature 0.7
--interactive
Or, with our newly trained model saved on disk:
oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml
--model.model_name output/smollm135m.fft
--generation.max_new_tokens 40
--generation.temperature 0.7
--interactive
To learn more about running inference locally or remotely (including OpenAI, Google, Anthropic APIs) and leveraging inference engines to parallelize and speed up your jobs, visit our inference guide.
βοΈ Launching Jobs in the Cloud#
So far we have been using Oumi locally. But one of the most exciting and unique Oumi features, compared to similar frameworks, is its integrated ability to launch jobs directly to the cloud (GCP, AWS, Azure, etc).
This section of the quickstart is going to be a little different than the others, so please read this next bit carefully before you proceed.
This tutorial uses GCP; youβll need a GCP account. You can also use other cloud providers, such as AWS, Azure, etc. See running jobs remotely for more details.
In particular, Oumi uses Skypilot, and the recommended way to use SkyPilot and GCP is with a GCP service account
You will need to install Oumi with GCP support:
pip install oumi[gcp]
. Please note that we recommend setting up a different environment for each cloud provider you wish to use.Depending on your precise use case, you may also need to install a few other packages from Google
conda install -c conda-forge google-cloud-sdk -y conda install -c conda-forge google-api-python-client -y conda install -c conda-forge google-cloud-storage -y
- There are multiple ways to handle credentials with GCP service accounts. We recommend creating a service account key in JSON format, then downloading it to the machine from which you plan to launch the cloud job. After that, youβll need to run a few more setup commands.
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS gcloud config set project
You can now run sky check
to confirm GCP is enabled.
If you get stuck, please refer to our running jobs remotely section, as well as the documentation for GCP and SkyPilot linked above, for more information.
Launching your first cloud job with Oumi#
Once the one-time setup is out of the way, launching a new cloud job with Oumi is very simple.
Job config to tune smollm 135M on 1 GCP node.
Usage:
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --cluster smollm-135m-fft
See Also:
- Documentation: https://oumi.ai/docs/en/latest/user_guides/launch/launch.html
- Config class: oumi.core.configs.JobConfig
- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/job_config.py
- Other job configs: configs/**/*job.yaml
name: smollm-135m-sft
resources: cloud: gcp accelerators: "A100:1" use_spot: false disk_size: 100 # Disk size in GBs
working_dir: .
envs: OUMI_RUN_NAME: smollm135m.train
https://github.com/huggingface/tokenizers/issues/899#issuecomment-1027739758
TOKENIZERS_PARALLELISM: false
setup: | set -e pip install uv && uv pip install oumi[gpu]
run: | set -e # Exit if any command failed. source ./configs/examples/misc/sky_init.sh
set -x oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
echo "Training complete!"
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml
To launch an evaluation job:
Job config to evaluate smollm 135M on 1 GCP node.
Usage:
oumi launch up -c configs/recipes/smollm/evaluation/135m/quickstart_gcp_job.yaml --cluster smollm-135m-eval
See Also:
- Documentation: https://oumi.ai/docs/en/latest/user_guides/launch/launch.html
- Config class: oumi.core.configs.JobConfig
- Config source: https://github.com/oumi-ai/oumi/blob/main/src/oumi/core/configs/job_config.py
- Other job configs: configs/**/*job.yaml
name: smollm-135m-eval
resources: cloud: gcp accelerators: "A100:1" use_spot: false disk_size: 100 # Disk size in GBs
working_dir: .
envs: OUMI_RUN_NAME: smollm135m.eval
https://github.com/huggingface/tokenizers/issues/899#issuecomment-1027739758
TOKENIZERS_PARALLELISM: false
setup: | set -e pip install uv && uv pip install oumi[gpu,evaluation]
run: | set -e # Exit if any command failed. source ./configs/examples/misc/sky_init.sh
set -x oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml
echo "Evaluation complete!"
oumi launch up -c configs/recipes/smollm/evaluation/135m/quickstart_gcp_job.yaml
After you run one of the above commands, you should see some console output from Oumi which describes how your job is being provisioned and how the cloud installation is proceeding. In particular, your cluster will be assigned a semi-random name such as sky-7fdd-ab183
, which you should take note of.
After 15 minutes or so, Oumi should tell you that the run is complete.
If you want to see the logs from your cloud run, you can pull them down to your local machine β
sky logs --sync-down sky-7fdd-ab183
Cloud services can be expensive! Please keep an eye on your costs, and donβt forget to tear down your cluster when youβre done with this tutorial.
This command will destroy your cluster, including all data on those remote machines, so save your logs and artifacts first!
π§ Whatβs next?#
Although this example used GCP, Oumi natively supports a wide range of cloud providers. To explore the Cloud providers that we support, visit running jobs remotely.