Dictionary Observations by J-Travnik · Pull Request #243 · DLR-RM/stable-baselines3 (original) (raw)
@@ -4,23 +4,45 @@ Changelog
==========
Release 1.1.0a5 (WIP)
Release 1.1.0a6 (WIP)
---------------------------
**Dict observation support, timeout handling and refactored HER**
Breaking Changes:
^^^^^^^^^^^^^^^^^
- All customs environments (e.g. the ``BitFlippingEnv`` or ``IdentityEnv``) were moved to ``stable_baselines3.common.envs`` folder
- Refactored ``HER`` which is now the ``HerReplayBuffer`` class that can be passed to any off-policy algorithm
- Handle timeout termination properly for off-policy algorithms (when using ``TimeLimit``)
- Renamed ``_last_dones`` and ``dones`` to ``_last_episode_starts`` and ``episode_starts`` in ``RolloutBuffer``.
- Removed ``ObsDictWrapper`` as ``Dict`` observation spaces are now supported
.. code-block:: python
her_kwargs = dict(n_sampled_goal=2, goal_selection_strategy="future", online_sampling=True)
# SB3 < 1.1.0
# model = HER("MlpPolicy", env, model_class=SAC, **her_kwargs)
# SB3 >= 1.1.0:
model = SAC("MultiInputPolicy", env, replay_buffer_class=HerReplayBuffer, replay_buffer_kwargs=her_kwargs)
- Updated the KL Divergence estimator in the PPO algorithm to be positive definite and have lower variance (@09tangriro)
- Updated the KL Divergence check in the PPO algorithm to be before the gradient update step rather than after end of epoch (@09tangriro)
New Features:
^^^^^^^^^^^^^
- Added support for single-level ``Dict`` observation space (@JadenTravnik)
- Added ``DictRolloutBuffer`` ``DictReplayBuffer`` to support dictionary observations (@JadenTravnik)
- Added ``StackedObservations`` and ``StackedDictObservations`` that are used within ``VecFrameStack``
- Added simple 4x4 room Dict test environments
- ``HerReplayBuffer`` now supports ``VecNormalize`` when ``online_sampling=False``
- Added `VecMonitor https://github.com/DLR-RM/stable-baselines3/blob/master/stable\_baselines3/common/vec\_env/vec\_monitor.py`_ and
`VecExtractDictObs https://github.com/DLR-RM/stable-baselines3/blob/master/stable\_baselines3/common/vec\_env/vec\_extract\_dict\_obs.py`_ wrappers
to handle gym3-style vectorized environments (@vwxyzjn)
- Ignored the terminal observation if the it is not provided by the environment
such as the gym3-style vectorized environments. (@vwxyzjn)
- Add policy_base as input to the OnPolicyAlgorithm for more flexibility (@09tangriro)
- Added support for image observation when using ``HER``
- Added ``replay_buffer_class`` and ``replay_buffer_kwargs`` arguments to off-policy algorithms
Bug Fixes:
^^^^^^^^^^
@@ -34,9 +56,9 @@ Deprecations:
Others:
^^^^^^^
- Added ``flake8-bugbear`` to tests dependencies to find likely bugs
- Updated ``env_checker`` to reflect support of dict observation spaces
- Added Code of Conduct
- Added tests for GAE and lambda return computation
- Updated docker image with newest black version
Documentation:
^^^^^^^^^^^^^^
@@ -71,6 +93,7 @@ New Features:
- Added support for ``custom_objects`` when loading models
Bug Fixes:
^^^^^^^^^^
- Fixed a bug with ``DQN`` predict method when using ``deterministic=False`` with image space
@@ -81,10 +104,14 @@ Documentation:
- Added new project using SB3: rl_reach (@PierreExeter)
- Added note about slow-down when switching to PyTorch
- Add a note on continual learning and resetting environment
Others:
^^^^^^^
- Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
- Added images to illustrate the training loop and custom policies (created with https://excalidraw.com/)
- Updated the custom policy section
Pre-Release 0.11.1 (2021-02-27)
-------------------------------
@@ -132,6 +159,7 @@ New Features:
- Added new wrappers to log images and matplotlib figures to tensorboard. (@zampanteymedio)
- Add support for text records to ``Logger``. (@lorenz-h)
Bug Fixes:
^^^^^^^^^^
- Fixed bug where code added VecTranspose on channel-first image environments (thanks @qxcv)
@@ -657,5 +685,5 @@ And all the contributors:
@flodorner @KuKuXia @NeoExtended @PartiallyTyped @mmcenta @richardwu @kinalmehta @rolandgvc @tkelestemur @mloo3
@tirafesi @blurLake @koulakis @joeljosephjin @shwang @rk37 @andyshih12 @RaphaelWag @xicocaio
@diditforlulz273 @liorcohen5 @ManifoldFR @mloo3 @SwamyDev @wmmc88 @megan-klaiber @thisray
@tfederico @hn2 @LucasAlegre @AptX395 @zampanteymedio @decodyng @ardabbour @lorenz-h @mschweizer @lorepieri8 @vwxyzjn
@tfederico @hn2 @LucasAlegre @AptX395 @zampanteymedio @JadenTravnik @decodyng @ardabbour @lorenz-h @mschweizer @lorepieri8 @vwxyzjn
@ShangqunYu @PierreExeter @JacopoPan @ltbd78 @tom-doerr @Atlis @liusida @09tangriro