stable-baselines3 (original) (raw)

🚀 Feature

As mentioned in the RoadMap, adding dict/tuple support for observations is a planned feature. This follows from the OpenAI gym api which has Tuple and Dict as possible observation spaces.

Motivation

Currently, stablebaselines3 only supports one (image or a vector) observation. Extending this to Tuple/Dict observations would support for environments which have different inputs of data.

Current Plan

I plan on implementing this feature but I'd like to have some pointers on how to go about it.
Below is my current plan but I'd really like to verify it as a good way forward.

I think that I need to create a child class of RolloutBufferSamples which stores a list/dict of observations rather than a single observation.

However, this may require adding a bool on the rollout_buffer itself so that the conversion to tensor (see on_policy_algorithm.py), can be performed over each element of the list/dict. Its not my favorite approach and I'd like to avoid it if possible.

From here, I think that the other necessary changes would permeate through the repository:

add a "CombinedExtractor" in torch_layers.py that can take in multiple observations.
add a new Policy for each algorithm to use the new extractors
modify util.py and preprocessing.py to handle the new rollout type

Is this a good approach?