Integrations — Stable Baselines3 2.7.0a0 documentation (original) (raw)

Weights & Biases

Weights & Biases provides a callback for experiment tracking that allows to visualize and share results.

The full documentation is available here: https://docs.wandb.ai/guides/integrations/other/stable-baselines-3

import gymnasium as gym import wandb from wandb.integration.sb3 import WandbCallback

from stable_baselines3 import PPO

config = { "policy_type": "MlpPolicy", "total_timesteps": 25000, "env_id": "CartPole-v1", } run = wandb.init( project="sb3", config=config, sync_tensorboard=True, # auto-upload sb3's tensorboard metrics # monitor_gym=True, # auto-upload the videos of agents playing the game # save_code=True, # optional )

model = PPO(config["policy_type"], config["env_id"], verbose=1, tensorboard_log=f"runs/{run.id}") model.learn( total_timesteps=config["total_timesteps"], callback=WandbCallback( model_save_path=f"models/{run.id}", verbose=2, ), ) run.finish()

Hugging Face 🤗

The Hugging Face Hub 🤗 is a central place where anyone can share and explore models. It allows you to host your saved models 💾.

You can see the list of stable-baselines3 saved models here: https://huggingface.co/models?library=stable-baselines3Most of them are available via the RL Zoo.

Official pre-trained models are saved in the SB3 organization on the hub: https://huggingface.co/sb3

We wrote a tutorial on how to use 🤗 Hub and Stable-Baselines3here.

Installation

pip install huggingface_sb3

Note

If you use the RL Zoo, pushing/loading models from the hub are already integrated:

Download model and save it into the logs/ folder

Only use TRUST_REMOTE_CODE=True with HF models that can be trusted (here the SB3 organization)

TRUST_REMOTE_CODE=True python -m rl_zoo3.load_from_hub --algo a2c --env LunarLander-v3 -orga sb3 -f logs/

Test the agent

python -m rl_zoo3.enjoy --algo a2c --env LunarLander-v3 -f logs/

Push model, config and hyperparameters to the hub

python -m rl_zoo3.push_to_hub --algo a2c --env LunarLander-v3 -f logs/ -orga sb3 -m "Initial commit"

Download a model from the Hub

You need to copy the repo-id that contains your saved model. For instance sb3/demo-hf-CartPole-v1:

import os

import gymnasium as gym

from huggingface_sb3 import load_from_hub from stable_baselines3 import PPO from stable_baselines3.common.evaluation import evaluate_policy

Allow the use of pickle.load() when downloading model from the hub

Please make sure that the organization from which you download can be trusted

os.environ["TRUST_REMOTE_CODE"] = "True"

Retrieve the model from the hub

repo_id = id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name})

filename = name of the model zip file from the repository

checkpoint = load_from_hub( repo_id="sb3/demo-hf-CartPole-v1", filename="ppo-CartPole-v1.zip", ) model = PPO.load(checkpoint)

Evaluate the agent and watch it

eval_env = gym.make("CartPole-v1") mean_reward, std_reward = evaluate_policy( model, eval_env, render=True, n_eval_episodes=5, deterministic=True, warn=False ) print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

You need to define two parameters:

Upload a model to the Hub

You can easily upload your models using two different functions:

  1. package_to_hub(): save the model, evaluate it, generate a model card and record a replay video of your agent before pushing the complete repo to the Hub.
  2. push_to_hub(): simply push a file to the Hub.

First, you need to be logged in to Hugging Face to upload a model:

from huggingface_hub import notebook_login notebook_login()

Then, in this example, we train a PPO agent to play CartPole-v1 and push it to a new repo sb3/demo-hf-CartPole-v1

With package_to_hub()

from stable_baselines3 import PPO from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

Create the environment

env_id = "CartPole-v1" env = make_vec_env(env_id, n_envs=1)

Create the evaluation environment

eval_env = make_vec_env(env_id, n_envs=1)

Instantiate the agent

model = PPO("MlpPolicy", env, verbose=1)

Train the agent

model.learn(total_timesteps=int(5000))

This method saves, evaluates, generates a model card and records a replay video of your agent before pushing the repo to the hub

package_to_hub(model=model, model_name="ppo-CartPole-v1", model_architecture="PPO", env_id=env_id, eval_env=eval_env, repo_id="sb3/demo-hf-CartPole-v1", commit_message="Test commit")

You need to define seven parameters:

With push_to_hub()

from stable_baselines3 import PPO from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import push_to_hub

Create the environment

env_id = "CartPole-v1" env = make_vec_env(env_id, n_envs=1)

Instantiate the agent

model = PPO("MlpPolicy", env, verbose=1)

Train the agent

model.learn(total_timesteps=int(5000))

Save the model

model.save("ppo-CartPole-v1")

Push this saved model .zip file to the hf repo

If this repo does not exist it will be created

repo_id = id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name})

filename: the name of the file == "name" inside model.save("ppo-CartPole-v1")

push_to_hub( repo_id="sb3/demo-hf-CartPole-v1", filename="ppo-CartPole-v1.zip", commit_message="Added CartPole-v1 model trained with PPO", )

You need to define three parameters:

MLFLow

If you want to use MLFLow to track your SB3 experiments, you can adapt the following code which defines a custom logger output:

import sys from typing import Any, Dict, Tuple, Union

import mlflow import numpy as np

from stable_baselines3 import SAC from stable_baselines3.common.logger import HumanOutputFormat, KVWriter, Logger

class MLflowOutputFormat(KVWriter): """ Dumps key/value pairs into MLflow's numeric format. """

def write(
    self,
    key_values: Dict[str, Any],
    key_excluded: Dict[str, Union[str, Tuple[str, ...]]],
    step: int = 0,
) -> None:

    for (key, value), (_, excluded) in zip(
        sorted(key_values.items()), sorted(key_excluded.items())
    ):

        if excluded is not None and "mlflow" in excluded:
            continue

        if isinstance(value, np.ScalarType):
            if not isinstance(value, str):
                mlflow.log_metric(key, value, step)

loggers = Logger( folder=None, output_formats=[HumanOutputFormat(sys.stdout), MLflowOutputFormat()], )

with mlflow.start_run(): model = SAC("MlpPolicy", "Pendulum-v1", verbose=2) # Set custom logger model.set_logger(loggers) model.learn(total_timesteps=10000, log_interval=1)