[Feature Request] basic dict/tuple support for observations 路 Issue #216 路 DLR-RM/stable-baselines3 (original) (raw)
馃殌 Feature
As mentioned in the RoadMap, adding dict/tuple support for observations is a planned feature. This follows from the OpenAI gym api which has Tuple and Dict as possible observation spaces.
Motivation
Currently, stablebaselines3 only supports one (image or a vector) observation. Extending this to Tuple/Dict observations would support for environments which have different inputs of data.
Current Plan
I plan on implementing this feature but I'd like to have some pointers on how to go about it.
Below is my current plan but I'd really like to verify it as a good way forward.
I think that I need to create a child class of RolloutBufferSamples
which stores a list/dict of observations rather than a single observation.
However, this may require adding a bool on the rollout_buffer
itself so that the conversion to tensor (see on_policy_algorithm.py), can be performed over each element of the list/dict. Its not my favorite approach and I'd like to avoid it if possible.
From here, I think that the other necessary changes would permeate through the repository:
- add a "CombinedExtractor" in
torch_layers.py
that can take in multiple observations. - add a new Policy for each algorithm to use the new extractors
- modify
util.py
andpreprocessing.py
to handle the new rollout type
Is this a good approach?