Vectorized Environments — Stable Baselines3 2.8.0a1 documentation (original) (raw)

Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n environments per step. Because of this, actions passed to the environment are now a vector (of dimension n). It is the same for observations, rewards and end of episode signals (dones). In the case of non-array observation spaces such as Dict or Tuple, where different sub-spaces may have different shapes, the sub-observations are vectors (of dimension n).

Name Box Discrete Dict Tuple Multi Processing
DummyVecEnv ✔️ ✔️ ✔️ ✔️ ❌️
SubprocVecEnv ✔️ ✔️ ✔️ ✔️ ✔️

Note

Vectorized environments are required when using wrappers for frame-stacking or normalization.

Note

When using vectorized environments, the environments are automatically reset at the end of each episode. Thus, the observation returned for the i-th environment when done[i] is true will in fact be the first observation of the next episode, not the last observation of the episode that has just terminated. You can access the “real” final observation of the terminated episode—that is, the one that accompanied the done event provided by the underlying environment—using the terminal_observation keys in the info dicts returned by the VecEnv.

Warning

When defining a custom VecEnv (for instance, using gym3 ProcgenEnv), you should provide terminal_observation keys in the info dicts returned by the VecEnv(cf. note above).

Warning

When using SubprocVecEnv, users must wrap the code in an if __name__ == "__main__": if using the forkserver or spawn start method (default on Windows). On Linux, the default start method is fork which is not thread safe and can create deadlocks.

For more information, see Python’s multiprocessing guidelines.

VecEnv API vs Gym API

For consistency across Stable-Baselines3 (SB3) versions and because of its special requirements and features, SB3 VecEnv API is not the same as Gym API. SB3 VecEnv API is actually close to Gym 0.21 API but differs to Gym 0.26+ API:

done is True at the end of an episode

dones[env_idx] = terminated[env_idx] or truncated[env_idx]

In SB3, truncated and terminated are mutually exclusive

infos[env_idx]["TimeLimit.truncated"] = truncated and not terminated

terminated[env_idx] tells you whether you should bootstrap or not:

when the episode has not ended or when the termination was a timeout/truncation

terminated[env_idx] = dones[env_idx] and not infos[env_idx]["TimeLimit.truncated"]
should_bootstrap[env_idx] = not terminated[env_idx]

Modifying Vectorized Environments Attributes

If you plan to modify the attributes of an environment while it is used (e.g., modifying an attribute specifying the task carried out for a portion of training when doing multi-task learning, or a parameter of the environment dynamics), you must expose a setter method. In fact, directly accessing the environment attribute in the callback can lead to unexpected behavior because environments can be wrapped (using gym or VecEnv wrappers, the Monitor wrapper being one example).

Consider the following example for a custom env:

import gymnasium as gym from gymnasium import spaces

from stable_baselines3.common.env_util import make_vec_env

class MyMultiTaskEnv(gym.Env):

def init(self): super().init() """ A state and action space for robotic locomotion. The multi-task twist is that the policy would need to adapt to different terrains, each with its own friction coefficient, mu. The friction coefficient is the only parameter that changes between tasks. mu is a scalar between 0 and 1, and during training a callback is used to update mu. """ ...

def step(self, action): # Do something, depending on the action and current value of mu the next state is computed return self._get_obs(), reward, done, truncated, info

def set_mu(self, new_mu: float) -> None: # Note: this value should be used only at the next reset self.mu = new_mu

Example of wrapped env

env is of type <TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv>>>>

env = gym.make("CartPole-v1")

To access the base env, without wrapper, you should use .unwrapped

or env.get_wrapper_attr("gravity") to include wrappers

env.unwrapped.gravity

SB3 uses VecEnv for training, where env.unwrapped.x = new_value cannot be used to set an attribute

therefore, you should expose a setter like set_mu to properly set an attribute

vec_env = make_vec_env(MyMultiTaskEnv)

Print current mu value

Note: you should use vec_env.env_method("get_wrapper_attr", "mu") in Gymnasium v1.0

print(vec_env.env_method("get_wrapper_attr", "mu"))

Change mu attribute via the setter

vec_env.env_method("set_mu", "mu", 0.1)

If the variable exists, you can also use set_wrapper_attr to set it

assert vec_env.has_attr("mu") vec_env.env_method("set_wrapper_attr", "mu", 0.1)

In this example env.mu cannot be accessed/changed directly because it is wrapped in a VecEnv and because it could be wrapped with other wrappers (see GH#1573 for a longer explanation). Instead, the callback should use the set_mu method via the env_method method for Vectorized Environments.

from itertools import cycle

class ChangeMuCallback(BaseCallback): """ This callback changes the value of mu during training looping through a list of values until training is aborted. The environment is implemented so that the impact of changing the value of mu mid-episode is visible only after the episode is over and the reset method has been called. """" def init(self): super().init() # An iterator that contains the different of the friction coefficient self.mus = cycle([0.1, 0.2, 0.5, 0.13, 0.9])

def _on_step(self): # Note: in practice, you should not change this value at every step # but rather depending on some events/metrics like agent performance/episode termination # both accessible via the self.logger or self.locals variables self.training_env.env_method("set_mu", next(self.mus))

This callback can then be used to safely modify environment attributes during training since it calls the environment setter method.

Vectorized Environments Wrappers

If you want to alter or augment a VecEnv without redefining it completely (e.g. stack multiple frames, monitor the VecEnv, normalize the observation, …), you can use VecEnvWrapper for that. They are the vectorized equivalents (i.e., they act on multiple environments at the same time) of gym.Wrapper.

You can find below an example for extracting one key from the observation:

import numpy as np

from stable_baselines3.common.vec_env.base_vec_env import VecEnv, VecEnvStepReturn, VecEnvWrapper

class VecExtractDictObs(VecEnvWrapper): """ A vectorized wrapper for filtering a specific key from dictionary observations. Similar to Gym's FilterObservation wrapper: https://github.com/openai/gym/blob/master/gym/wrappers/filter_observation.py

:param venv: The vectorized environment
:param key: The key of the dictionary observation
"""

def __init__(self, venv: VecEnv, key: str):
    self.key = key
    super().__init__(venv=venv, observation_space=venv.observation_space.spaces[self.key])

def reset(self) -> np.ndarray:
    obs = self.venv.reset()
    return obs[self.key]

def step_async(self, actions: np.ndarray) -> None:
    self.venv.step_async(actions)

def step_wait(self) -> VecEnvStepReturn:
    obs, reward, done, info = self.venv.step_wait()
    return obs[self.key], reward, done, info

env = DummyVecEnv([lambda: gym.make("FetchReach-v1")])

Wrap the VecEnv

env = VecExtractDictObs(env, key="observation")

Note

When creating a vectorized environment, you can also specify ordinary gymnasium wrappers to wrap each of the sub-environments. See themake_vec_envdocumentation for details. Example:

from gymnasium.wrappers import RescaleAction from stable_baselines3.common.env_util import make_vec_env

Use gym wrapper for each sub-env of the VecEnv

wrapper_kwargs = dict(min_action=-1.0, max_action=1.0) vec_env = make_vec_env( "Pendulum-v1", n_envs=2, wrapper_class=RescaleAction, wrapper_kwargs=wrapper_kwargs )

VecEnv

class stable_baselines3.common.vec_env.VecEnv(num_envs, observation_space, action_space)[source]

An abstract asynchronous, vectorized environment.

Parameters:

abstractmethod close()[source]

Clean up the environment’s resources.

Return type:

None

abstractmethod env_is_wrapped(wrapper_class, indices=None)[source]

Check if environments are wrapped with a given wrapper.

Parameters:

Returns:

True if the env is wrapped, False otherwise, for each env queried.

Return type:

list[bool]

abstractmethod env_method(method_name, *method_args, indices=None, **method_kwargs)[source]

Call instance methods of vectorized environments.

Parameters:

Returns:

List of items returned by the environment’s method call

Return type:

list[_Any_]

abstractmethod get_attr(attr_name, indices=None)[source]

Return attribute from vectorized environment.

Parameters:

Returns:

List of values of ‘attr_name’ in all environments

Return type:

list[_Any_]

get_images()[source]

Return RGB images from each environment when available

Return type:

_Sequence_[ndarray | None]

getattr_depth_check(name, already_found)[source]

Check if an attribute reference is being hidden in a recursive call to __getattr__

Parameters:

Returns:

name of module whose attribute is being shadowed, if any.

Return type:

str | None

has_attr(attr_name)[source]

Check if an attribute exists for a vectorized environment.

Parameters:

attr_name (str) – The name of the attribute to check

Returns:

True if ‘attr_name’ exists in all environments

Return type:

bool

render(mode=None)[source]

Gym environment rendering

Parameters:

mode (str | None) – the rendering type

Return type:

ndarray | None

abstractmethod reset()[source]

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:

observation

Return type:

ndarray | dict[str, _ndarray_] | tuple[ndarray, …]

seed(seed=None)[source]

Sets the random seeds for all environments, based on a given seed. Each individual environment will still get its own seed, by incrementing the given seed. WARNING: since gym 0.26, those seeds will only be passed to the environment at the next reset.

Parameters:

seed (int | None) – The random seed. May be None for completely random seeding.

Returns:

Returns a list containing the seeds for each individual env. Note that all list elements may be None, if the env does not return anything when being seeded.

Return type:

_Sequence_[None | int]

abstractmethod set_attr(attr_name, value, indices=None)[source]

Set attribute inside vectorized environments.

Parameters:

Returns:

Return type:

None

set_options(options=None)[source]

Set environment options for all environments. If a dict is passed instead of a list, the same options will be used for all environments. WARNING: Those options will only be passed to the environment at the next reset.

Parameters:

options (list [ dict ] | dict | None) – A dictionary of environment options to pass to each environment at the next reset.

Return type:

None

step(actions)[source]

Step the environments with the given action

Parameters:

actions (ndarray) – the action

Returns:

observation, reward, done, information

Return type:

tuple[ndarray | dict[str, _ndarray_] | tuple[ndarray, …], ndarray, ndarray, list[dict]]

abstractmethod step_async(actions)[source]

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

Parameters:

actions (ndarray)

Return type:

None

abstractmethod step_wait()[source]

Wait for the step taken with step_async().

Returns:

observation, reward, done, information

Return type:

tuple[ndarray | dict[str, _ndarray_] | tuple[ndarray, …], ndarray, ndarray, list[dict]]

DummyVecEnv

class stable_baselines3.common.vec_env.DummyVecEnv(env_fns)[source]

Creates a simple vectorized wrapper for multiple environments, calling each environment in sequence on the current Python process. This is useful for computationally simple environment such as Cartpole-v1, as the overhead of multiprocess or multithread outweighs the environment computation time. This can also be used for RL methods that require a vectorized environment, but that you want a single environments to train with.

Parameters:

env_fns (list [ Callable [ [ ] , Env ] ]) – a list of functions that return environments to vectorize

Raises:

ValueError – If the same environment instance is passed as the output of two or more different env_fn.

close()[source]

Clean up the environment’s resources.

Return type:

None

env_is_wrapped(wrapper_class, indices=None)[source]

Check if worker environments are wrapped with a given wrapper

Parameters:

Return type:

list[bool]

env_method(method_name, *method_args, indices=None, **method_kwargs)[source]

Call instance methods of vectorized environments.

Parameters:

Return type:

list[_Any_]

get_attr(attr_name, indices=None)[source]

Return attribute from vectorized environment (see base class).

Parameters:

Return type:

list[_Any_]

get_images()[source]

Return RGB images from each environment when available

Return type:

_Sequence_[ndarray | None]

render(mode=None)[source]

Gym environment rendering. If there are multiple environments then they are tiled together in one image via BaseVecEnv.render().

Parameters:

mode (str | None) – The rendering type.

Return type:

ndarray | None

reset()[source]

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:

observation

Return type:

ndarray | dict[str, _ndarray_] | tuple[ndarray, …]

set_attr(attr_name, value, indices=None)[source]

Set attribute inside vectorized environments (see base class).

Parameters:

Return type:

None

step_async(actions)[source]

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

Parameters:

actions (ndarray)

Return type:

None

step_wait()[source]

Wait for the step taken with step_async().

Returns:

observation, reward, done, information

Return type:

tuple[ndarray | dict[str, _ndarray_] | tuple[ndarray, …], ndarray, ndarray, list[dict]]

SubprocVecEnv

class stable_baselines3.common.vec_env.SubprocVecEnv(env_fns, start_method=None)[source]

Creates a multiprocess vectorized wrapper for multiple environments, distributing each environment to its own process, allowing significant speed up when the environment is computationally complex.

For performance reasons, if your environment is not IO bound, the number of environments should not exceed the number of logical cores on your CPU.

Warning

Only ‘forkserver’ and ‘spawn’ start methods are thread-safe, which is important when TensorFlow sessions or other non thread-safe libraries are used in the parent (see issue #217). However, compared to ‘fork’ they incur a small start-up cost and have restrictions on global variables. With those methods, users must wrap the code in anif __name__ == "__main__": block. For more information, see the multiprocessing documentation.

Parameters:

close()[source]

Clean up the environment’s resources.

Return type:

None

env_is_wrapped(wrapper_class, indices=None)[source]

Check if worker environments are wrapped with a given wrapper

Parameters:

Return type:

list[bool]

env_method(method_name, *method_args, indices=None, **method_kwargs)[source]

Call instance methods of vectorized environments.

Parameters:

Return type:

list[_Any_]

get_attr(attr_name, indices=None)[source]

Return attribute from vectorized environment (see base class).

Parameters:

Return type:

list[_Any_]

get_images()[source]

Return RGB images from each environment when available

Return type:

_Sequence_[ndarray | None]

has_attr(attr_name)[source]

Check if an attribute exists for a vectorized environment. (see base class).

Parameters:

attr_name (str)

Return type:

bool

reset()[source]

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:

observation

Return type:

ndarray | dict[str, _ndarray_] | tuple[ndarray, …]

set_attr(attr_name, value, indices=None)[source]

Set attribute inside vectorized environments (see base class).

Parameters:

Return type:

None

step_async(actions)[source]

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

Parameters:

actions (ndarray)

Return type:

None

step_wait()[source]

Wait for the step taken with step_async().

Returns:

observation, reward, done, information

Return type:

tuple[ndarray | dict[str, _ndarray_] | tuple[ndarray, …], ndarray, ndarray, list[dict]]

Wrappers

VecFrameStack

class stable_baselines3.common.vec_env.VecFrameStack(venv, n_stack, channels_order=None)[source]

Frame stacking wrapper for vectorized environment. Designed for image observations.

Parameters:

reset()[source]

Reset all environments

Return type:

ndarray | dict[str, _ndarray_]

step_wait()[source]

Wait for the step taken with step_async().

Returns:

observation, reward, done, information

Return type:

tuple[ndarray | dict[str, _ndarray_], ndarray, ndarray, list[dict[str, _Any_]]]

StackedObservations

class stable_baselines3.common.vec_env.stacked_observations.StackedObservations(num_envs, n_stack, observation_space, channels_order=None)[source]

Frame stacking wrapper for data.

Dimension to stack over is either first (channels-first) or last (channels-last), which is detected automatically usingcommon.preprocessing.is_image_space_channels_first if observation is an image space.

Parameters:

static compute_stacking(n_stack, observation_space, channels_order=None)[source]

Calculates the parameters in order to stack observations

Parameters:

Returns:

Tuple of channels_first, stack_dimension, stackedobs, repeat_axis

Return type:

tuple[bool, int, tuple[int, …], int]

reset(observation)[source]

Reset the stacked_obs, add the reset observation to the stack, and return the stack.

Parameters:

observation (TObs) – Reset observation

Returns:

The stacked reset observation

Return type:

TObs

update(observations, dones, infos)[source]

Add the observations to the stack and use the dones to update the infos.

Parameters:

Returns:

Tuple of the stacked observations and the updated infos

Return type:

tuple[TObs, list[dict[str, _Any_]]]

VecNormalize

class stable_baselines3.common.vec_env.VecNormalize(venv, training=True, norm_obs=True, norm_reward=True, clip_obs=10.0, clip_reward=10.0, gamma=0.99, epsilon=1e-08, norm_obs_keys=None)[source]

A moving average, normalizing wrapper for vectorized environment. has support for saving/loading moving average,

Parameters:

get_original_obs()[source]

Returns an unnormalized version of the observations from the most recent step or reset.

Return type:

ndarray | dict[str, _ndarray_]

get_original_reward()[source]

Returns an unnormalized version of the rewards from the most recent step.

Return type:

ndarray

static load(load_path, venv)[source]

Loads a saved VecNormalize object.

Parameters:

Returns:

Return type:

VecNormalize

normalize_obs(obs)[source]

Normalize observations using this VecNormalize’s observations statistics. Calling this method does not update statistics.

Parameters:

obs (ndarray | dict [ str , ndarray ])

Return type:

ndarray | dict[str, _ndarray_]

normalize_reward(reward)[source]

Normalize rewards using this VecNormalize’s rewards statistics. Calling this method does not update statistics.

Parameters:

reward (ndarray)

Return type:

ndarray

reset()[source]

Reset all environments :return: first observation of the episode

Return type:

ndarray | dict[str, _ndarray_]

save(save_path)[source]

Save current VecNormalize object with all running statistics and settings (e.g. clip_obs)

Parameters:

save_path (str) – The path to save to

Return type:

None

set_venv(venv)[source]

Sets the vector environment to wrap to venv.

Also sets attributes derived from this such as num_env.

Parameters:

venv (VecEnv)

Return type:

None

step_wait()[source]

Apply sequence of actions to sequence of environments actions -> (observations, rewards, dones)

where dones is a boolean vector indicating whether each element is new.

Return type:

tuple[ndarray | dict[str, _ndarray_] | tuple[ndarray, …], ndarray, ndarray, list[dict]]

VecVideoRecorder

class stable_baselines3.common.vec_env.VecVideoRecorder(venv, video_folder, record_video_trigger, video_length=200, name_prefix='rl-video')[source]

Wraps a VecEnv or VecEnvWrapper object to record rendered image as mp4 video. It requires ffmpeg or avconv to be installed on the machine.

Note: for now it only allows to record one video and all videos must have at least two frames.

The video recorder code was adapted from Gymnasium v1.0.

Parameters:

close()[source]

Closes the wrapper then the video recorder.

Return type:

None

reset()[source]

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:

observation

Return type:

ndarray | dict[str, _ndarray_] | tuple[ndarray, …]

step_wait()[source]

Wait for the step taken with step_async().

Returns:

observation, reward, done, information

Return type:

tuple[ndarray | dict[str, _ndarray_] | tuple[ndarray, …], ndarray, ndarray, list[dict]]

VecCheckNan

class stable_baselines3.common.vec_env.VecCheckNan(venv, raise_exception=False, warn_once=True, check_inf=True)[source]

NaN and inf checking wrapper for vectorized environment, will raise a warning by default, allowing you to know from what the NaN of inf originated from.

Parameters:

check_array_value(name, value)[source]

Check for inf and NaN for a single numpy array.

Parameters:

Returns:

A list of issues found.

Return type:

list[tuple[str, str]]

reset()[source]

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:

observation

Return type:

ndarray | dict[str, _ndarray_] | tuple[ndarray, …]

step_async(actions)[source]

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

Parameters:

actions (ndarray)

Return type:

None

step_wait()[source]

Wait for the step taken with step_async().

Returns:

observation, reward, done, information

Return type:

tuple[ndarray | dict[str, _ndarray_] | tuple[ndarray, …], ndarray, ndarray, list[dict]]

VecTransposeImage

class stable_baselines3.common.vec_env.VecTransposeImage(venv, skip=False)[source]

Re-order channels, from HxWxC to CxHxW. It is required for PyTorch convolution layers.

Parameters:

close()[source]

Clean up the environment’s resources.

Return type:

None

reset()[source]

Reset all environments

Return type:

ndarray | dict

step_wait()[source]

Wait for the step taken with step_async().

Returns:

observation, reward, done, information

Return type:

tuple[ndarray | dict[str, _ndarray_] | tuple[ndarray, …], ndarray, ndarray, list[dict]]

static transpose_image(image)[source]

Transpose an image or batch of images (re-order channels).

Parameters:

image (ndarray)

Returns:

Return type:

ndarray

transpose_observations(observations)[source]

Transpose (if needed) and return new observations.

Parameters:

observations (ndarray | dict)

Returns:

Transposed observations

Return type:

ndarray | dict

static transpose_space(observation_space, key='')[source]

Transpose an observation space (re-order channels).

Parameters:

Returns:

Return type:

Box

VecMonitor

class stable_baselines3.common.vec_env.VecMonitor(venv, filename=None, info_keywords=())[source]

A vectorized monitor wrapper for vectorized Gym environments, it is used to record the episode reward, length, time and other data.

Some environments like openai/procgenor gym3 directly initialize the vectorized environments, without giving us a chance to use the Monitorwrapper. So this class simply does the job of the Monitor wrapper on a vectorized level.

Parameters:

close()[source]

Clean up the environment’s resources.

Return type:

None

reset()[source]

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:

observation

Return type:

ndarray | dict[str, _ndarray_] | tuple[ndarray, …]

step_wait()[source]

Wait for the step taken with step_async().

Returns:

observation, reward, done, information

Return type:

tuple[ndarray | dict[str, _ndarray_] | tuple[ndarray, …], ndarray, ndarray, list[dict]]