isaaclab.envs — Isaac Lab Documentation (original) (raw)

isaaclab.envs#

Sub-package for environment definitions.

Environments define the interface between the agent and the simulation. In the simplest case, the environment provides the agent with the current observations and executes the actions provided by the agent. However, the environment can also provide additional information such as the current reward, done flag, and information about the current episode.

There are two types of environment designing workflows:

Based on these workflows, there are the following environment classes for single and multi-agent RL:

Single-Agent RL:

Multi-Agent RL (MARL):

For more information about the workflow design patterns, see the Task Design Workflows section.

Submodules

Classes

Manager Based Environment#

class isaaclab.envs.ManagerBasedEnv[source]#

The base environment encapsulates the simulation scene and the environment managers for the manager-based workflow.

While a simulation scene or world comprises of different components such as the robots, objects, and sensors (cameras, lidars, etc.), the environment is a higher level abstraction that provides an interface for interacting with the simulation. The environment is comprised of the following components:

The environment provides a unified interface for interacting with the simulation. However, it does not include task-specific quantities such as the reward function, or the termination conditions. These quantities are often specific to defining Markov Decision Processes (MDPs) while the base environment is agnostic to the MDP definition.

The environment steps forward in time at a fixed time-step. The physics simulation is decimated at a lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured independently using the ManagerBasedEnvCfg.decimation (number of simulation steps per environment step) and the ManagerBasedEnvCfg.sim.dt (physics time-step) parameters. Based on these parameters, the environment time-step is computed as the product of the two. The two time-steps can be obtained by querying the physics_dt and the step_dt properties respectively.

Methods:

Attributes:

__init__(cfg: ManagerBasedEnvCfg)[source]#

Initialize the environment.

Parameters:

cfg – The configuration object for the environment.

Raises:

RuntimeError – If a simulation context already exists. The environment must always create one since it configures the simulation context and controls the simulation.

property num_envs_: int_#

The number of instances of the environment that are running.

property physics_dt_: float_#

The physics time-step (in s).

This is the lowest time-decimation at which the simulation is happening.

property step_dt_: float_#

The environment stepping time-step (in s).

This is the time-step at which the environment steps forward.

property device#

The device on which the environment is running.

load_managers()[source]#

Load the managers for the environment.

This function is responsible for creating the various managers (action, observation, events, etc.) for the environment. Since the managers require access to physics handles, they can only be created after the simulator is reset (i.e. played for the first time).

Note

In case of standalone application (when running simulator from Python), the function is called automatically when the class is initialized.

However, in case of extension mode, the user must call this function manually after the simulator is reset. This is because the simulator is only reset when the user callsSimulationContext.reset_async() and it isn’t possible to call async functions in the constructor.

setup_manager_visualizers()[source]#

Creates live visualizers for manager terms.

reset(seed: int | None = None, env_ids: Sequence[int] | None = None, options: dict[str, Any] | None = None) → tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], dict][source]#

Resets the specified environments and returns observations.

This function calls the _reset_idx() function to reset the specified environments. However, certain operations, such as procedural terrain generation, that happened during initialization are not repeated.

Parameters:

Returns:

A tuple containing the observations and extras.

reset_to(state: dict[str, dict[str, dict[str, torch.Tensor]]], env_ids: Sequence[int] | None, seed: int | None = None, is_relative: bool = False)[source]#

Resets specified environments to provided states.

This function resets the environments to the provided states. The state is a dictionary containing the state of the scene entities. Please refer to InteractiveScene.get_state()for the format.

The function is different from the reset() function as it resets the environments to specific states, instead of using the randomization events for resetting the environments.

Parameters:

step(action: torch.Tensor) → tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], dict][source]#

Execute one time-step of the environment’s dynamics.

The environment steps forward at a fixed time-step, while the physics simulation is decimated at a lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured independently using the ManagerBasedEnvCfg.decimation (number of simulation steps per environment step) and the ManagerBasedEnvCfg.sim.dt (physics time-step). Based on these parameters, the environment time-step is computed as the product of the two.

Parameters:

action – The actions to apply on the environment. Shape is (num_envs, action_dim).

Returns:

A tuple containing the observations and extras.

static seed(seed: int = -1) → int[source]#

Set the seed for the environment.

Parameters:

seed – The seed for random generator. Defaults to -1.

Returns:

The seed used for random generator.

close()[source]#

Cleanup for the environment.

class isaaclab.envs.ManagerBasedEnvCfg[source]#

Base configuration of the environment.

Attributes:

Classes:

viewer_: ViewerCfg_#

Viewer configuration. Default is ViewerCfg().

sim_: SimulationCfg_#

Physics simulation configuration. Default is SimulationCfg().

ui_window_class_type#

alias of BaseEnvWindow

seed_: int | None_#

The seed for the random number generator. Defaults to None, in which case the seed is not set.

Note

The seed is set at the beginning of the environment initialization. This ensures that the environment creation is deterministic and behaves similarly across different runs.

decimation_: int_#

Number of control action updates @ sim dt per policy dt.

For instance, if the simulation dt is 0.01s and the policy dt is 0.1s, then the decimation is 10. This means that the control action is updated every 10 simulation steps.

scene_: InteractiveSceneCfg_#

Scene settings.

Please refer to the isaaclab.scene.InteractiveSceneCfg class for more details.

recorders_: object_#

Recorder settings. Defaults to recording nothing.

Please refer to the isaaclab.managers.RecorderManager class for more details.

observations_: object_#

Observation space settings.

Please refer to the isaaclab.managers.ObservationManager class for more details.

actions_: object_#

Action space settings.

Please refer to the isaaclab.managers.ActionManager class for more details.

events_: object_#

Event settings. Defaults to the basic configuration that resets the scene to its default state.

Please refer to the isaaclab.managers.EventManager class for more details.

rerender_on_reset_: bool_#

Whether a render step is performed again after at least one environment has been reset. Defaults to False, which means no render step will be performed after reset.

wait_for_textures_: bool_#

True to wait for assets to be loaded completely, False otherwise. Defaults to True.

xr_: XrCfg | None_#

Configuration for viewing and interacting with the environment through an XR device.

Manager Based RL Environment#

class isaaclab.envs.ManagerBasedRLEnv[source]#

Bases: ManagerBasedEnv, Env

The superclass for the manager-based workflow reinforcement learning-based environments.

This class inherits from ManagerBasedEnv and implements the core functionality for reinforcement learning-based environments. It is designed to be used with any RL library. The class is designed to be used with vectorized environments, i.e., the environment is expected to be run in parallel with multiple sub-environments. The number of sub-environments is specified using the num_envs.

Each observation from the environment is a batch of observations for each sub- environments. The method step() is also expected to receive a batch of actions for each sub-environment.

While the environment itself is implemented as a vectorized environment, we do not inherit from gym.vector.VectorEnv. This is mainly because the class adds various methods (for wait and asynchronous updates) which are not required. Additionally, each RL library typically has its own definition for a vectorized environment. Thus, to reduce complexity, we directly use the gym.Env over here and leave it up to library-defined wrappers to take care of wrapping this environment for their agents.

Note

For vectorized environments, it is recommended to only call the reset()method once before the first call to step(), i.e. after the environment is created. After that, the step() function handles the reset of terminated sub-environments. This is because the simulator does not support resetting individual sub-environments in a vectorized environment.

Attributes:

Methods:

is_vector_env_: ClassVar[bool]_ = True#

Whether the environment is a vectorized environment.

metadata_: ClassVar[dict[str, Any]]_ = {'isaac_sim_version': isaacsim.core.version.get_version, 'render_modes': [None, 'human', 'rgb_array']}#

Metadata for the environment.

cfg_: ManagerBasedRLEnvCfg_#

Configuration for the environment.

__init__(cfg: ManagerBasedRLEnvCfg, render_mode: str | None = None, **kwargs)[source]#

Initialize the environment.

Parameters:

property max_episode_length_s_: float_#

Maximum episode length in seconds.

property max_episode_length_: int_#

Maximum episode length in environment steps.

load_managers()[source]#

Load the managers for the environment.

This function is responsible for creating the various managers (action, observation, events, etc.) for the environment. Since the managers require access to physics handles, they can only be created after the simulator is reset (i.e. played for the first time).

Note

In case of standalone application (when running simulator from Python), the function is called automatically when the class is initialized.

However, in case of extension mode, the user must call this function manually after the simulator is reset. This is because the simulator is only reset when the user callsSimulationContext.reset_async() and it isn’t possible to call async functions in the constructor.

setup_manager_visualizers()[source]#

Creates live visualizers for manager terms.

step(action: torch.Tensor) → tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], torch.Tensor, torch.Tensor, torch.Tensor, dict][source]#

Execute one time-step of the environment’s dynamics and reset terminated environments.

Unlike the ManagerBasedEnv.step class, the function performs the following operations:

  1. Process the actions.
  2. Perform physics stepping.
  3. Perform rendering if gui is enabled.
  4. Update the environment counters and compute the rewards and terminations.
  5. Reset the environments that terminated.
  6. Compute the observations.
  7. Return the observations, rewards, resets and extras.

Parameters:

action – The actions to apply on the environment. Shape is (num_envs, action_dim).

Returns:

A tuple containing the observations, rewards, resets (terminated and truncated) and extras.

render(recompute: bool = False) → np.ndarray | None[source]#

Run rendering without stepping through the physics.

By convention, if mode is:

Parameters:

recompute – Whether to force a render even if the simulator has already rendered the scene. Defaults to False.

Returns:

The rendered image as a numpy array if mode is “rgb_array”. Otherwise, returns None.

Raises:

close()[source]#

Cleanup for the environment.

property device#

The device on which the environment is running.

get_wrapper_attr(name: str) → Any#

Gets the attribute name from the environment.

has_wrapper_attr(name: str) → bool#

Checks if the attribute name exists in the environment.

property np_random_: numpy.random.Generator_#

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property np_random_seed_: int_#

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:

the seed of the current np_random or -1, if the seed of the rng is unknown

Return type:

int

property num_envs_: int_#

The number of instances of the environment that are running.

property physics_dt_: float_#

The physics time-step (in s).

This is the lowest time-decimation at which the simulation is happening.

reset(seed: int | None = None, env_ids: Sequence[int] | None = None, options: dict[str, Any] | None = None) → tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], dict]#

Resets the specified environments and returns observations.

This function calls the _reset_idx() function to reset the specified environments. However, certain operations, such as procedural terrain generation, that happened during initialization are not repeated.

Parameters:

Returns:

A tuple containing the observations and extras.

reset_to(state: dict[str, dict[str, dict[str, torch.Tensor]]], env_ids: Sequence[int] | None, seed: int | None = None, is_relative: bool = False)#

Resets specified environments to provided states.

This function resets the environments to the provided states. The state is a dictionary containing the state of the scene entities. Please refer to InteractiveScene.get_state()for the format.

The function is different from the reset() function as it resets the environments to specific states, instead of using the randomization events for resetting the environments.

Parameters:

static seed(seed: int = -1) → int#

Set the seed for the environment.

Parameters:

seed – The seed for random generator. Defaults to -1.

Returns:

The seed used for random generator.

set_wrapper_attr(name: str, value: Any, *, force: bool = True) → bool#

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

property step_dt_: float_#

The environment stepping time-step (in s).

This is the time-step at which the environment steps forward.

property unwrapped_: Env[ObsType, ActType]_#

Returns the base non-wrapped environment.

Returns:

The base non-wrapped gymnasium.Env instance

Return type:

Env

class isaaclab.envs.ManagerBasedRLEnvCfg[source]#

Bases: ManagerBasedEnvCfg

Configuration for a reinforcement learning environment with the manager-based workflow.

Classes:

Attributes:

ui_window_class_type#

alias of ManagerBasedRLEnvWindow

is_finite_horizon_: bool_#

Whether the learning task is treated as a finite or infinite horizon problem for the agent. Defaults to False, which means the task is treated as an infinite horizon problem.

This flag handles the subtleties of finite and infinite horizon tasks:

If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out (or truncated) done signal is sent to the agent.

Note

The base ManagerBasedRLEnv class does not use this flag directly. It is used by the environment wrappers to determine what type of done signal to send to the corresponding learning agent.

episode_length_s_: float_#

Duration of an episode (in seconds).

Based on the decimation rate and physics time step, the episode length is calculated as:

episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))

For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds, then the episode length in steps is 100.

rewards_: object_#

Reward settings.

Please refer to the isaaclab.managers.RewardManager class for more details.

viewer_: ViewerCfg_#

Viewer configuration. Default is ViewerCfg().

sim_: SimulationCfg_#

Physics simulation configuration. Default is SimulationCfg().

seed_: int | None_#

The seed for the random number generator. Defaults to None, in which case the seed is not set.

Note

The seed is set at the beginning of the environment initialization. This ensures that the environment creation is deterministic and behaves similarly across different runs.

decimation_: int_#

Number of control action updates @ sim dt per policy dt.

For instance, if the simulation dt is 0.01s and the policy dt is 0.1s, then the decimation is 10. This means that the control action is updated every 10 simulation steps.

scene_: InteractiveSceneCfg_#

Scene settings.

Please refer to the isaaclab.scene.InteractiveSceneCfg class for more details.

recorders_: object_#

Recorder settings. Defaults to recording nothing.

Please refer to the isaaclab.managers.RecorderManager class for more details.

observations_: object_#

Observation space settings.

Please refer to the isaaclab.managers.ObservationManager class for more details.

actions_: object_#

Action space settings.

Please refer to the isaaclab.managers.ActionManager class for more details.

events_: object_#

Event settings. Defaults to the basic configuration that resets the scene to its default state.

Please refer to the isaaclab.managers.EventManager class for more details.

rerender_on_reset_: bool_#

Whether a render step is performed again after at least one environment has been reset. Defaults to False, which means no render step will be performed after reset.

wait_for_textures_: bool_#

True to wait for assets to be loaded completely, False otherwise. Defaults to True.

xr_: XrCfg | None_#

Configuration for viewing and interacting with the environment through an XR device.

terminations_: object_#

Termination settings.

Please refer to the isaaclab.managers.TerminationManager class for more details.

curriculum_: object | None_#

Curriculum settings. Defaults to None, in which case no curriculum is applied.

Please refer to the isaaclab.managers.CurriculumManager class for more details.

commands_: object | None_#

Command settings. Defaults to None, in which case no commands are generated.

Please refer to the isaaclab.managers.CommandManager class for more details.

Direct RL Environment#

class isaaclab.envs.DirectRLEnv[source]#

Bases: Env

The superclass for the direct workflow to design environments.

This class implements the core functionality for reinforcement learning (RL) environments. It is designed to be used with any RL library. The class is designed to be used with vectorized environments, i.e., the environment is expected to be run in parallel with multiple sub-environments.

While the environment itself is implemented as a vectorized environment, we do not inherit from gym.vector.VectorEnv. This is mainly because the class adds various methods (for wait and asynchronous updates) which are not required. Additionally, each RL library typically has its own definition for a vectorized environment. Thus, to reduce complexity, we directly use the gym.Env over here and leave it up to library-defined wrappers to take care of wrapping this environment for their agents.

Note

For vectorized environments, it is recommended to only call the reset()method once before the first call to step(), i.e. after the environment is created. After that, the step() function handles the reset of terminated sub-environments. This is because the simulator does not support resetting individual sub-environments in a vectorized environment.

Attributes:

Methods:

is_vector_env_: ClassVar[bool]_ = True#

Whether the environment is a vectorized environment.

metadata_: ClassVar[dict[str, Any]]_ = {'isaac_sim_version': isaacsim.core.version.get_version, 'render_modes': [None, 'human', 'rgb_array']}#

Metadata for the environment.

__init__(cfg: DirectRLEnvCfg, render_mode: str | None = None, **kwargs)[source]#

Initialize the environment.

Parameters:

Raises:

RuntimeError – If a simulation context already exists. The environment must always create one since it configures the simulation context and controls the simulation.

property num_envs_: int_#

The number of instances of the environment that are running.

property physics_dt_: float_#

The physics time-step (in s).

This is the lowest time-decimation at which the simulation is happening.

property step_dt_: float_#

The environment stepping time-step (in s).

This is the time-step at which the environment steps forward.

property device#

The device on which the environment is running.

property max_episode_length_s_: float_#

Maximum episode length in seconds.

property max_episode_length#

The maximum episode length in steps adjusted from s.

reset(seed: int | None = None, options: dict[str, Any] | None = None) → tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], dict][source]#

Resets all the environments and returns observations.

This function calls the _reset_idx() function to reset all the environments. However, certain operations, such as procedural terrain generation, that happened during initialization are not repeated.

Parameters:

Returns:

A tuple containing the observations and extras.

step(action: torch.Tensor) → tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], torch.Tensor, torch.Tensor, torch.Tensor, dict][source]#

Execute one time-step of the environment’s dynamics.

The environment steps forward at a fixed time-step, while the physics simulation is decimated at a lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured independently using the DirectRLEnvCfg.decimation (number of simulation steps per environment step) and the DirectRLEnvCfg.sim.physics_dt (physics time-step). Based on these parameters, the environment time-step is computed as the product of the two.

This function performs the following steps:

  1. Pre-process the actions before stepping through the physics.
  2. Apply the actions to the simulator and step through the physics in a decimated manner.
  3. Compute the reward and done signals.
  4. Reset environments that have terminated or reached the maximum episode length.
  5. Apply interval events if they are enabled.
  6. Compute observations.

Parameters:

action – The actions to apply on the environment. Shape is (num_envs, action_dim).

Returns:

A tuple containing the observations, rewards, resets (terminated and truncated) and extras.

static seed(seed: int = -1) → int[source]#

Set the seed for the environment.

Parameters:

seed – The seed for random generator. Defaults to -1.

Returns:

The seed used for random generator.

render(recompute: bool = False) → np.ndarray | None[source]#

Run rendering without stepping through the physics.

By convention, if mode is:

Parameters:

recompute – Whether to force a render even if the simulator has already rendered the scene. Defaults to False.

Returns:

The rendered image as a numpy array if mode is “rgb_array”. Otherwise, returns None.

Raises:

close()[source]#

Cleanup for the environment.

set_debug_vis(debug_vis: bool) → bool[source]#

Toggles the environment debug visualization.

Parameters:

debug_vis – Whether to visualize the environment debug visualization.

Returns:

Whether the debug visualization was successfully set. False if the environment does not support debug visualization.

get_wrapper_attr(name: str) → Any#

Gets the attribute name from the environment.

has_wrapper_attr(name: str) → bool#

Checks if the attribute name exists in the environment.

property np_random_: numpy.random.Generator_#

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property np_random_seed_: int_#

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:

the seed of the current np_random or -1, if the seed of the rng is unknown

Return type:

int

set_wrapper_attr(name: str, value: Any, *, force: bool = True) → bool#

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

property unwrapped_: Env[ObsType, ActType]_#

Returns the base non-wrapped environment.

Returns:

The base non-wrapped gymnasium.Env instance

Return type:

Env

class isaaclab.envs.DirectRLEnvCfg[source]#

Bases: object

Configuration for an RL environment defined with the direct workflow.

Please refer to the isaaclab.envs.direct_rl_env.DirectRLEnv class for more details.

Attributes:

Classes:

viewer_: ViewerCfg_#

Viewer configuration. Default is ViewerCfg().

sim_: SimulationCfg_#

Physics simulation configuration. Default is SimulationCfg().

ui_window_class_type#

alias of BaseEnvWindow

seed_: int | None_#

The seed for the random number generator. Defaults to None, in which case the seed is not set.

Note

The seed is set at the beginning of the environment initialization. This ensures that the environment creation is deterministic and behaves similarly across different runs.

decimation_: int_#

Number of control action updates @ sim dt per policy dt.

For instance, if the simulation dt is 0.01s and the policy dt is 0.1s, then the decimation is 10. This means that the control action is updated every 10 simulation steps.

is_finite_horizon_: bool_#

Whether the learning task is treated as a finite or infinite horizon problem for the agent. Defaults to False, which means the task is treated as an infinite horizon problem.

This flag handles the subtleties of finite and infinite horizon tasks:

If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out (or truncated) done signal is sent to the agent.

Note

The base ManagerBasedRLEnv class does not use this flag directly. It is used by the environment wrappers to determine what type of done signal to send to the corresponding learning agent.

episode_length_s_: float_#

Duration of an episode (in seconds).

Based on the decimation rate and physics time step, the episode length is calculated as:

episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))

For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds, then the episode length in steps is 100.

scene_: InteractiveSceneCfg_#

Scene settings.

Please refer to the isaaclab.scene.InteractiveSceneCfg class for more details.

events_: object | None_#

Event settings. Defaults to None, in which case no events are applied through the event manager.

Please refer to the isaaclab.managers.EventManager class for more details.

observation_space_: SpaceType_#

Observation space definition.

The space can be defined either using Gymnasium spaces (when a more detailed specification of the space is desired) or basic Python data types (for simplicity).

num_observations_: int | None_#

The dimension of the observation space from each environment instance.

Warning

This attribute is deprecated. Use observation_space instead.

state_space_: SpaceType | None_#

State space definition.

This is useful for asymmetric actor-critic and defines the observation space for the critic.

The space can be defined either using Gymnasium spaces (when a more detailed specification of the space is desired) or basic Python data types (for simplicity).

num_states_: int | None_#

The dimension of the state-space from each environment instance.

Warning

This attribute is deprecated. Use state_space instead.

observation_noise_model_: NoiseModelCfg | None_#

The noise model to apply to the computed observations from the environment. Default is None, which means no noise is added.

Please refer to the isaaclab.utils.noise.NoiseModel class for more details.

action_space_: SpaceType_#

Action space definition.

The space can be defined either using Gymnasium spaces (when a more detailed specification of the space is desired) or basic Python data types (for simplicity).

num_actions_: int | None_#

The dimension of the action space for each environment.

Warning

This attribute is deprecated. Use action_space instead.

action_noise_model_: NoiseModelCfg | None_#

The noise model applied to the actions provided to the environment. Default is None, which means no noise is added.

Please refer to the isaaclab.utils.noise.NoiseModel class for more details.

rerender_on_reset_: bool_#

Whether a render step is performed again after at least one environment has been reset. Defaults to False, which means no render step will be performed after reset.

wait_for_textures_: bool_#

True to wait for assets to be loaded completely, False otherwise. Defaults to True.

xr_: XrCfg | None_#

Configuration for viewing and interacting with the environment through an XR device.

Direct Multi-Agent RL Environment#

class isaaclab.envs.DirectMARLEnv[source]#

Bases: Env

The superclass for the direct workflow to design multi-agent environments.

This class implements the core functionality for multi-agent reinforcement learning (MARL) environments. It is designed to be used with any RL library. The class is designed to be used with vectorized environments, i.e., the environment is expected to be run in parallel with multiple sub-environments.

The design of this class is based on the PettingZoo Parallel API. While the environment itself is implemented as a vectorized environment, we do not inherit from pettingzoo.ParallelEnv or gym.vector.VectorEnv. This is mainly because the class adds various attributes and methods that are inconsistent with them.

Note

For vectorized environments, it is recommended to only call the reset()method once before the first call to step(), i.e. after the environment is created. After that, the step() function handles the reset of terminated sub-environments. This is because the simulator does not support resetting individual sub-environments in a vectorized environment.

Attributes:

Methods:

metadata_: ClassVar[dict[str, Any]]_ = {'isaac_sim_version': isaacsim.core.version.get_version, 'render_modes': [None, 'human', 'rgb_array']}#

Metadata for the environment.

__init__(cfg: DirectMARLEnvCfg, render_mode: str | None = None, **kwargs)[source]#

Initialize the environment.

Parameters:

Raises:

RuntimeError – If a simulation context already exists. The environment must always create one since it configures the simulation context and controls the simulation.

property num_envs_: int_#

The number of instances of the environment that are running.

property num_agents_: int_#

Number of current agents.

The number of current agents may change as the environment progresses (e.g.: agents can be added or removed).

property max_num_agents_: int_#

Number of all possible agents the environment can generate.

This value remains constant as the environment progresses.

property unwrapped_: DirectMARLEnv_#

Get the unwrapped environment underneath all the layers of wrappers.

property physics_dt_: float_#

The physics time-step (in s).

This is the lowest time-decimation at which the simulation is happening.

property step_dt_: float_#

The environment stepping time-step (in s).

This is the time-step at which the environment steps forward.

property device#

The device on which the environment is running.

property max_episode_length_s_: float_#

Maximum episode length in seconds.

property max_episode_length#

The maximum episode length in steps adjusted from s.

observation_space(agent: AgentID) → Space[source]#

Get the observation space for the specified agent.

Returns:

The agent’s observation space.

action_space(agent: AgentID) → Space[source]#

Get the action space for the specified agent.

Returns:

The agent’s action space.

reset(seed: int | None = None, options: dict[str, Any] | None = None) → tuple[dict[AgentID, ObsType], dict[AgentID, dict]][source]#

Resets all the environments and returns observations.

Parameters:

Returns:

A tuple containing the observations and extras (keyed by the agent ID).

step(actions: dict[AgentID, ActionType]) → tuple[Dict[AgentID, ObsType], Dict[AgentID, torch.Tensor], Dict[AgentID, torch.Tensor], Dict[AgentID, torch.Tensor], Dict[AgentID, dict]][source]#

Execute one time-step of the environment’s dynamics.

The environment steps forward at a fixed time-step, while the physics simulation is decimated at a lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured independently using the DirectMARLEnvCfg.decimation (number of simulation steps per environment step) and the DirectMARLEnvCfg.sim.physics_dt (physics time-step). Based on these parameters, the environment time-step is computed as the product of the two.

This function performs the following steps:

  1. Pre-process the actions before stepping through the physics.
  2. Apply the actions to the simulator and step through the physics in a decimated manner.
  3. Compute the reward and done signals.
  4. Reset environments that have terminated or reached the maximum episode length.
  5. Apply interval events if they are enabled.
  6. Compute observations.

Parameters:

actions – The actions to apply on the environment (keyed by the agent ID). Shape of individual tensors is (num_envs, action_dim).

Returns:

A tuple containing the observations, rewards, resets (terminated and truncated) and extras (keyed by the agent ID).

state() → StateType | None[source]#

Returns the state for the environment.

The state-space is used for centralized training or asymmetric actor-critic architectures. It is configured using the DirectMARLEnvCfg.state_space parameter.

Returns:

The states for the environment, or None if DirectMARLEnvCfg.state_space parameter is zero.

static seed(seed: int = -1) → int[source]#

Set the seed for the environment.

Parameters:

seed – The seed for random generator. Defaults to -1.

Returns:

The seed used for random generator.

render(recompute: bool = False) → np.ndarray | None[source]#

Run rendering without stepping through the physics.

By convention, if mode is:

Parameters:

recompute – Whether to force a render even if the simulator has already rendered the scene. Defaults to False.

Returns:

The rendered image as a numpy array if mode is “rgb_array”. Otherwise, returns None.

Raises:

close()[source]#

Cleanup for the environment.

set_debug_vis(debug_vis: bool) → bool[source]#

Toggles the environment debug visualization.

Parameters:

debug_vis – Whether to visualize the environment debug visualization.

Returns:

Whether the debug visualization was successfully set. False if the environment does not support debug visualization.

get_wrapper_attr(name: str) → Any#

Gets the attribute name from the environment.

has_wrapper_attr(name: str) → bool#

Checks if the attribute name exists in the environment.

property np_random_: numpy.random.Generator_#

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property np_random_seed_: int_#

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:

the seed of the current np_random or -1, if the seed of the rng is unknown

Return type:

int

set_wrapper_attr(name: str, value: Any, *, force: bool = True) → bool#

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

class isaaclab.envs.DirectMARLEnvCfg[source]#

Bases: object

Configuration for a MARL environment defined with the direct workflow.

Please refer to the isaaclab.envs.direct_marl_env.DirectMARLEnv class for more details.

Attributes:

Classes:

viewer_: ViewerCfg_#

Viewer configuration. Default is ViewerCfg().

sim_: SimulationCfg_#

Physics simulation configuration. Default is SimulationCfg().

ui_window_class_type#

alias of BaseEnvWindow

seed_: int | None_#

The seed for the random number generator. Defaults to None, in which case the seed is not set.

Note

The seed is set at the beginning of the environment initialization. This ensures that the environment creation is deterministic and behaves similarly across different runs.

decimation_: int_#

Number of control action updates @ sim dt per policy dt.

For instance, if the simulation dt is 0.01s and the policy dt is 0.1s, then the decimation is 10. This means that the control action is updated every 10 simulation steps.

is_finite_horizon_: bool_#

Whether the learning task is treated as a finite or infinite horizon problem for the agent. Defaults to False, which means the task is treated as an infinite horizon problem.

This flag handles the subtleties of finite and infinite horizon tasks:

If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out (or truncated) done signal is sent to the agent.

Note

The base ManagerBasedRLEnv class does not use this flag directly. It is used by the environment wrappers to determine what type of done signal to send to the corresponding learning agent.

episode_length_s_: float_#

Duration of an episode (in seconds).

Based on the decimation rate and physics time step, the episode length is calculated as:

episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))

For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds, then the episode length in steps is 100.

scene_: InteractiveSceneCfg_#

Scene settings.

Please refer to the isaaclab.scene.InteractiveSceneCfg class for more details.

events_: object_#

Event settings. Defaults to None, in which case no events are applied through the event manager.

Please refer to the isaaclab.managers.EventManager class for more details.

observation_spaces_: dict[AgentID, SpaceType]_#

Observation space definition for each agent.

The space can be defined either using Gymnasium spaces (when a more detailed specification of the space is desired) or basic Python data types (for simplicity).

num_observations_: dict[AgentID, int] | None_#

The dimension of the observation space for each agent.

Warning

This attribute is deprecated. Use observation_spaces instead.

state_space_: SpaceType_#

State space definition.

The following values are supported:

The space can be defined either using Gymnasium spaces (when a more detailed specification of the space is desired) or basic Python data types (for simplicity).

num_states_: int | None_#

The dimension of the state space from each environment instance.

Warning

This attribute is deprecated. Use state_space instead.

observation_noise_model_: dict[AgentID, isaaclab.utils.noise.noise_cfg.NoiseModelCfg | None] | None_#

The noise model to apply to the computed observations from the environment. Default is None, which means no noise is added.

Please refer to the isaaclab.utils.noise.NoiseModel class for more details.

action_spaces_: dict[AgentID, SpaceType]_#

Action space definition for each agent.

The space can be defined either using Gymnasium spaces (when a more detailed specification of the space is desired) or basic Python data types (for simplicity).

num_actions_: dict[AgentID, int] | None_#

The dimension of the action space for each agent.

Warning

This attribute is deprecated. Use action_spaces instead.

action_noise_model_: dict[AgentID, isaaclab.utils.noise.noise_cfg.NoiseModelCfg | None] | None_#

The noise model applied to the actions provided to the environment. Default is None, which means no noise is added.

Please refer to the isaaclab.utils.noise.NoiseModel class for more details.

possible_agents_: list[AgentID]_#

A list of all possible agents the environment could generate.

The contents of the list cannot be modified during the entire training process.

xr_: XrCfg | None_#

Configuration for viewing and interacting with the environment through an XR device.

Mimic Environment#

class isaaclab.envs.ManagerBasedRLMimicEnv[source]#

Bases: ManagerBasedRLEnv

The superclass for the Isaac Lab Mimic environments.

This class inherits from ManagerBasedRLEnv and provides a template for the functions that need to be defined to run the Isaac Lab Mimic data generation workflow. The Isaac Lab data generation pipeline, inspired by the MimicGen system, enables the generation of new datasets based on a few human collected demonstrations. MimicGen is a novel approach designed to automatically synthesize large-scale, rich datasets from a sparse set of human demonstrations by adapting them to new contexts. It manages to replicate the benefits of large datasets while reducing the immense time and effort usually required to gather extensive human demonstrations.

The MimicGen system works by parsing demonstrations into object-centric segments. It then adapts these segments to new scenes by transforming each segment according to the new scene’s context, stitching them into a coherent trajectory for a robotic end-effector to execute. This approach allows learners to train proficient agents through imitation learning on diverse configurations of scenes, object instances, etc.

Key Features:

Methods:

Attributes:

get_robot_eef_pose(eef_name: str, env_ids: Sequence[int] | None = None) → torch.Tensor[source]#

Get current robot end effector pose. Should be the same frame as used by the robot end-effector controller.

Parameters:

Returns:

A torch.Tensor eef pose matrix. Shape is (len(env_ids), 4, 4)

target_eef_pose_to_action(target_eef_pose_dict: dict, gripper_action_dict: dict, action_noise_dict: dict | None = None, env_id: int = 0) → torch.Tensor[source]#

Takes a target pose and gripper action for the end effector controller and returns an action (usually a normalized delta pose action) to try and achieve that target pose. Noise is added to the target pose action if specified.

Parameters:

Returns:

An action torch.Tensor that’s compatible with env.step().

action_to_target_eef_pose(action: torch.Tensor) → dict[str, torch.Tensor][source]#

Converts action (compatible with env.step) to a target pose for the end effector controller. Inverse of @target_eef_pose_to_action. Usually used to infer a sequence of target controller poses from a demonstration trajectory using the recorded actions.

Parameters:

action – Environment action. Shape is (num_envs, action_dim).

Returns:

A dictionary of eef pose torch.Tensor that @action corresponds to.

actions_to_gripper_actions(actions: torch.Tensor) → dict[str, torch.Tensor][source]#

Extracts the gripper actuation part from a sequence of env actions (compatible with env.step).

Parameters:

actions – environment actions. The shape is (num_envs, num steps in a demo, action_dim).

Returns:

A dictionary of torch.Tensor gripper actions. Key to each dict is an eef_name.

get_object_poses(env_ids: Sequence[int] | None = None)[source]#

Gets the pose of each object relevant to Isaac Lab Mimic data generation in the current scene.

Parameters:

env_ids – Environment indices to get the pose for. If None, all envs are considered.

Returns:

A dictionary that maps object names to object pose matrix (4x4 torch.Tensor)

get_subtask_term_signals(env_ids: Sequence[int] | None = None) → dict[str, torch.Tensor][source]#

Gets a dictionary of termination signal flags for each subtask in a task. The flag is 1 when the subtask has been completed and 0 otherwise. The implementation of this method is required if intending to enable automatic subtask term signal annotation when running the dataset annotation tool. This method can be kept unimplemented if intending to use manual subtask term signal annotation.

Parameters:

env_ids – Environment indices to get the termination signals for. If None, all envs are considered.

Returns:

A dictionary termination signal flags (False or True) for each subtask.

serialize()[source]#

Save all information needed to re-instantiate this environment in a dictionary. This is the same as @env_meta - environment metadata stored in hdf5 datasets, and used in utils/env_utils.py.

__init__(cfg: ManagerBasedRLEnvCfg, render_mode: str | None = None, **kwargs)#

Initialize the environment.

Parameters:

close()#

Cleanup for the environment.

property device#

The device on which the environment is running.

get_wrapper_attr(name: str) → Any#

Gets the attribute name from the environment.

has_wrapper_attr(name: str) → bool#

Checks if the attribute name exists in the environment.

is_vector_env_: ClassVar[bool]_ = True#

Whether the environment is a vectorized environment.

load_managers()#

Load the managers for the environment.

This function is responsible for creating the various managers (action, observation, events, etc.) for the environment. Since the managers require access to physics handles, they can only be created after the simulator is reset (i.e. played for the first time).

Note

In case of standalone application (when running simulator from Python), the function is called automatically when the class is initialized.

However, in case of extension mode, the user must call this function manually after the simulator is reset. This is because the simulator is only reset when the user callsSimulationContext.reset_async() and it isn’t possible to call async functions in the constructor.

property max_episode_length_: int_#

Maximum episode length in environment steps.

property max_episode_length_s_: float_#

Maximum episode length in seconds.

metadata_: ClassVar[dict[str, Any]]_ = {'isaac_sim_version': isaacsim.core.version.get_version, 'render_modes': [None, 'human', 'rgb_array']}#

Metadata for the environment.

property np_random_: numpy.random.Generator_#

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of np.random.Generator

property np_random_seed_: int_#

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:

the seed of the current np_random or -1, if the seed of the rng is unknown

Return type:

int

property num_envs_: int_#

The number of instances of the environment that are running.

property physics_dt_: float_#

The physics time-step (in s).

This is the lowest time-decimation at which the simulation is happening.

render(recompute: bool = False) → np.ndarray | None#

Run rendering without stepping through the physics.

By convention, if mode is:

Parameters:

recompute – Whether to force a render even if the simulator has already rendered the scene. Defaults to False.

Returns:

The rendered image as a numpy array if mode is “rgb_array”. Otherwise, returns None.

Raises:

reset(seed: int | None = None, env_ids: Sequence[int] | None = None, options: dict[str, Any] | None = None) → tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], dict]#

Resets the specified environments and returns observations.

This function calls the _reset_idx() function to reset the specified environments. However, certain operations, such as procedural terrain generation, that happened during initialization are not repeated.

Parameters:

Returns:

A tuple containing the observations and extras.

reset_to(state: dict[str, dict[str, dict[str, torch.Tensor]]], env_ids: Sequence[int] | None, seed: int | None = None, is_relative: bool = False)#

Resets specified environments to provided states.

This function resets the environments to the provided states. The state is a dictionary containing the state of the scene entities. Please refer to InteractiveScene.get_state()for the format.

The function is different from the reset() function as it resets the environments to specific states, instead of using the randomization events for resetting the environments.

Parameters:

static seed(seed: int = -1) → int#

Set the seed for the environment.

Parameters:

seed – The seed for random generator. Defaults to -1.

Returns:

The seed used for random generator.

set_wrapper_attr(name: str, value: Any, *, force: bool = True) → bool#

Sets the attribute name on the environment with value, see Wrapper.set_wrapper_attr for more info.

setup_manager_visualizers()#

Creates live visualizers for manager terms.

step(action: torch.Tensor) → tuple[Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]], torch.Tensor, torch.Tensor, torch.Tensor, dict]#

Execute one time-step of the environment’s dynamics and reset terminated environments.

Unlike the ManagerBasedEnv.step class, the function performs the following operations:

  1. Process the actions.
  2. Perform physics stepping.
  3. Perform rendering if gui is enabled.
  4. Update the environment counters and compute the rewards and terminations.
  5. Reset the environments that terminated.
  6. Compute the observations.
  7. Return the observations, rewards, resets and extras.

Parameters:

action – The actions to apply on the environment. Shape is (num_envs, action_dim).

Returns:

A tuple containing the observations, rewards, resets (terminated and truncated) and extras.

property step_dt_: float_#

The environment stepping time-step (in s).

This is the time-step at which the environment steps forward.

property unwrapped_: Env[ObsType, ActType]_#

Returns the base non-wrapped environment.

Returns:

The base non-wrapped gymnasium.Env instance

Return type:

Env

cfg_: ManagerBasedRLEnvCfg_#

Configuration for the environment.

class isaaclab.envs.MimicEnvCfg[source]#

Bases: object

Configuration class for the Mimic environment integration.

This class consolidates various configuration aspects for the Isaac Lab Mimic data generation pipeline.

class isaaclab.envs.SubTaskConfig[source]#

Bases: object

Configuration settings for specifying subtasks used in Mimic environments.

Attributes:

object_ref_: str_#

Reference to the object involved in this subtask.

Set to None if no object is involved (this is rarely the case).

subtask_term_signal_: str_#

Subtask termination signal name.

selection_strategy_: str_#

Strategy for selecting a subtask segment.

Can be one of:

Note

For ‘nearest_neighbor_object’ and ‘nearest_neighbor_robot_distance’, the subtask needs to have ‘object_ref’ set to a value other than ‘None’. These strategies typically yield higher success rates than the default ‘random’ strategy when object_ref is set.

selection_strategy_kwargs_: dict_#

Additional arguments to the selected strategy. See details on each strategy in source/isaaclab_mimic/isaaclab_mimic/datagen/selection_strategy.py Arguments will be passed through to the select_source_demo method.

first_subtask_start_offset_range_: tuple_#

Range for start offset of the first subtask.

subtask_term_offset_range_: tuple_#

Range for offsetting subtask termination.

action_noise_: float_#

Amplitude of action noise applied.

num_interpolation_steps_: int_#

Number of steps for interpolation between waypoints.

num_fixed_steps_: int_#

Number of fixed steps for the subtask.

apply_noise_during_interpolation_: bool_#

Whether to apply noise during interpolation.

description_: str_#

Description of the subtask

next_subtask_description_: str_#

Instructions for the next subtask

class isaaclab.envs.SubTaskConstraintConfig[source]#

Bases: object

Configuration settings for specifying subtask constraints used in multi-eef Mimic environments.

Attributes:

Methods:

eef_subtask_constraint_tuple_: list[tuple[str, int]]_#

List of associated subtasks tuples in order.

The first element of the tuple refers to the eef name. The second element of the tuple refers to the subtask index of the eef.

constraint_type_: SubTaskConstraintType_#

Type of constraint to apply between subtasks.

sequential_min_time_diff_: int_#

Minimum time difference between two sequential subtasks finishing.

The second subtask will execute until sequential_min_time_diff steps left in its subtask trajectory and wait until the first (preconditioned) subtask is finished to continue executing the rest. If set to -1, the second subtask will start only after the first subtask is finished.

coordination_scheme_: SubTaskConstraintCoordinationScheme_#

Scheme to use for coordinating subtasks.

coordination_scheme_pos_noise_scale_: float_#

Scale of position noise to apply during coordination.

coordination_scheme_rot_noise_scale_: float_#

Scale of rotation noise to apply during coordination.

coordination_synchronize_start_: bool_#

Whether subtasks should start at the same time.

generate_runtime_subtask_constraints()[source]#

Populate expanded task constraints dictionary based on the task constraint config. The task constraint config contains the configurations set by the user. While the task_constraints_dict contains flags used to implement the constraint logic in this class.

The task_constraint_configs may include the following types: - “sequential” - “coordination”

For a “sequential” constraint:

For a “coordination” constraint:

Common#

class isaaclab.envs.ViewerCfg[source]#

Configuration of the scene viewport camera.

Attributes:

eye_: tuple[float, float, float]_#

Initial camera position (in m). Default is (7.5, 7.5, 7.5).

lookat_: tuple[float, float, float]_#

Initial camera target position (in m). Default is (0.0, 0.0, 0.0).

cam_prim_path_: str_#

The camera prim path to record images from. Default is “/OmniverseKit_Persp”, which is the default camera in the viewport.

resolution_: tuple[int, int]_#

The resolution (width, height) of the camera specified using cam_prim_path. Default is (1280, 720).

origin_type_: Literal['world', 'env', 'asset_root', 'asset_body']_#

The frame in which the camera position (eye) and target (lookat) are defined in. Default is “world”.

Available options are:

env_index_: int_#

The environment index for frame origin. Default is 0.

This quantity is only effective if origin is set to “env” or “asset_root”.

asset_name_: str | None_#

The asset name in the interactive scene for the frame origin. Default is None.

This quantity is only effective if origin is set to “asset_root”.

body_name_: str | None_#

The name of the body in asset_name in the interactive scene for the frame origin. Default is None.

This quantity is only effective if origin is set to “asset_body”.