Dictionary Observations by J-Travnik · Pull Request #243 · DLR-RM/stable-baselines3 (original) (raw)

Expand Up

@@ -4,23 +4,45 @@ Changelog

==========

Release 1.1.0a5 (WIP)

Release 1.1.0a6 (WIP)

---------------------------

**Dict observation support, timeout handling and refactored HER**

Breaking Changes:

^^^^^^^^^^^^^^^^^

- All customs environments (e.g. the ``BitFlippingEnv`` or ``IdentityEnv``) were moved to ``stable_baselines3.common.envs`` folder

- Refactored ``HER`` which is now the ``HerReplayBuffer`` class that can be passed to any off-policy algorithm

- Handle timeout termination properly for off-policy algorithms (when using ``TimeLimit``)

- Renamed ``_last_dones`` and ``dones`` to ``_last_episode_starts`` and ``episode_starts`` in ``RolloutBuffer``.

- Removed ``ObsDictWrapper`` as ``Dict`` observation spaces are now supported

.. code-block:: python

her_kwargs = dict(n_sampled_goal=2, goal_selection_strategy="future", online_sampling=True)

# SB3 < 1.1.0

# model = HER("MlpPolicy", env, model_class=SAC, **her_kwargs)

# SB3 >= 1.1.0:

model = SAC("MultiInputPolicy", env, replay_buffer_class=HerReplayBuffer, replay_buffer_kwargs=her_kwargs)

- Updated the KL Divergence estimator in the PPO algorithm to be positive definite and have lower variance (@09tangriro)

- Updated the KL Divergence check in the PPO algorithm to be before the gradient update step rather than after end of epoch (@09tangriro)

New Features:

^^^^^^^^^^^^^

- Added support for single-level ``Dict`` observation space (@JadenTravnik)

- Added ``DictRolloutBuffer`` ``DictReplayBuffer`` to support dictionary observations (@JadenTravnik)

- Added ``StackedObservations`` and ``StackedDictObservations`` that are used within ``VecFrameStack``

- Added simple 4x4 room Dict test environments

- ``HerReplayBuffer`` now supports ``VecNormalize`` when ``online_sampling=False``

- Added `VecMonitor https://github.com/DLR-RM/stable-baselines3/blob/master/stable\_baselines3/common/vec\_env/vec\_monitor.py`_ and

`VecExtractDictObs https://github.com/DLR-RM/stable-baselines3/blob/master/stable\_baselines3/common/vec\_env/vec\_extract\_dict\_obs.py`_ wrappers

to handle gym3-style vectorized environments (@vwxyzjn)

- Ignored the terminal observation if the it is not provided by the environment

such as the gym3-style vectorized environments. (@vwxyzjn)

- Add policy_base as input to the OnPolicyAlgorithm for more flexibility (@09tangriro)

- Added support for image observation when using ``HER``

- Added ``replay_buffer_class`` and ``replay_buffer_kwargs`` arguments to off-policy algorithms

Bug Fixes:

^^^^^^^^^^

Expand All

@@ -34,9 +56,9 @@ Deprecations:

Others:

^^^^^^^

- Added ``flake8-bugbear`` to tests dependencies to find likely bugs

- Updated ``env_checker`` to reflect support of dict observation spaces

- Added Code of Conduct

- Added tests for GAE and lambda return computation

- Updated docker image with newest black version

Documentation:

^^^^^^^^^^^^^^

Expand Down Expand Up

@@ -71,6 +93,7 @@ New Features:

- Added support for ``custom_objects`` when loading models

Bug Fixes:

^^^^^^^^^^

- Fixed a bug with ``DQN`` predict method when using ``deterministic=False`` with image space

Expand All

@@ -81,10 +104,14 @@ Documentation:

- Added new project using SB3: rl_reach (@PierreExeter)

- Added note about slow-down when switching to PyTorch

- Add a note on continual learning and resetting environment

Others:

^^^^^^^

- Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents

- Added images to illustrate the training loop and custom policies (created with https://excalidraw.com/)

- Updated the custom policy section

Pre-Release 0.11.1 (2021-02-27)

-------------------------------

Expand Down Expand Up

@@ -132,6 +159,7 @@ New Features:

- Added new wrappers to log images and matplotlib figures to tensorboard. (@zampanteymedio)

- Add support for text records to ``Logger``. (@lorenz-h)

Bug Fixes:

^^^^^^^^^^

- Fixed bug where code added VecTranspose on channel-first image environments (thanks @qxcv)

Expand Down Expand Up

@@ -657,5 +685,5 @@ And all the contributors:

@flodorner @KuKuXia @NeoExtended @PartiallyTyped @mmcenta @richardwu @kinalmehta @rolandgvc @tkelestemur @mloo3

@tirafesi @blurLake @koulakis @joeljosephjin @shwang @rk37 @andyshih12 @RaphaelWag @xicocaio

@diditforlulz273 @liorcohen5 @ManifoldFR @mloo3 @SwamyDev @wmmc88 @megan-klaiber @thisray

@tfederico @hn2 @LucasAlegre @AptX395 @zampanteymedio @decodyng @ardabbour @lorenz-h @mschweizer @lorepieri8 @vwxyzjn

@tfederico @hn2 @LucasAlegre @AptX395 @zampanteymedio @JadenTravnik @decodyng @ardabbour @lorenz-h @mschweizer @lorepieri8 @vwxyzjn

@ShangqunYu @PierreExeter @JacopoPan @ltbd78 @tom-doerr @Atlis @liusida @09tangriro