Roadmap to Stable-Baselines3 V1.0 路 Issue #1 路 DLR-RM/stable-baselines3 (original) (raw)
Navigation Menu
- Explore
- Pricing
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Description
This issue is meant to be updated as the list of changes is not exhaustive
Dear all,
Stable-Baselines3 beta is now out 馃帀 ! This issue is meant to reference what is implemented and what is missing before a first major version.
As mentioned in the README, before v1.0, breaking changes may occur. I would like to encourage contributors (especially the maintainers) to make comments on how to improve the library before v1.0 (and maybe make some internal changes).
I will try to review the features mentioned in hill-a/stable-baselines#576 (and hill-a/stable-baselines#733)
and I will create issues soon to reference what is missing.
What is implemented?
- basic features (training/saving/loading/predict)
- basic set of algorithms (A2C/PPO/SAC/TD3)
- basic pre-processing (Box and Discrete observation/action spaces are handled)
- callback support
- complete benchmark for the continuous action case
- basic rl zoo for training/evaluating plotting (https://github.com/DLR-RM/rl-baselines3-zoo)
- consistent api
- basic tests and most type hints
- continuous integration (I'm in discussion with the organization admins for that)
- handle more observation/action spaces Add support for MultiDiscrete/MultiBinary observation spaces #4 and Add support for MultiDiscrete/MultiBinary action spaces #5 (thanks @rolandgvc)
- tensorboard integration Tensorboard integration #9 (thanks @rolandgvc)
- basic documentation and notebooks
- automatic build of the documentation
- Vanilla DQN Implement Vanilla DQN #6 (thanks @Artemis-Skade)
- Refactor off-policy critics to reduce code duplication Implement DDPG #3 (see Refactored ContinuousCritic for SAC/TD3 #78 )
- DDPG Implement DDPG #3
- do a complete benchmark for the discrete case Performance Check (Discrete actions) #49 (thanks @Miffyli !)
- performance check for continuous actions Performance check (Continuous Actions) #48 (even better than gSDE paper)
- get/set parameters for the base class (Get/set parameters and review of saving and loading #138 )
- clean up type-hints in docs Custom parser for type hints #10 (cumbersome to read)
- documenting the migration between SB and SB3 Migration guide #11
- finish typing some methods Improve typing coverage #175
- HER Implement HER #8 (thanks @megan-klaiber)
- finishing to update and clean the doc Missing Documentation #166 (help is wanted)
- finishing to update the notebooks and the tutorial Update colab notebooks #7 (I will do that, only HER notebook missing)
What are the new features?
- much cleaner base code (and no more warnings =D )
- independent saving/loading/predict for policies
- State-Dependent Exploration (SDE) for using RL directly on real robots (this is a unique feature, it was the starting point of SB3, I published a paper on that: https://arxiv.org/abs/2005.05719)
- proper evaluation (using separate env) is included in the base class (using
EvalCallback
) - all environments are
VecEnv
- better saving/loading (now can include the replay buffer and the optimizers)
- any number of critics are allowed for SAC/TD3
- custom actor/critic net arch for off-policy algos ([Feature request] Allow different network architectures for off-policy actor/critic #113 )
- QR-DQN in SB3-Contrib
- Truncated Quantile Critics (TQC) (see Implement Truncated Quantile Critics (TQC) #83 ) in SB3-Contrib
- @Miffyli suggested a "contrib" repo for experimental features (it is here)
What is missing?
- syncing some files with Stable-Baselines to remain consistent (we may be good now, but need to be checked)
- finish code-review of exisiting code Review of Existing Code #17
Checklist for v1.0 release
- Update Readme
- Prepare blog post
- Update doc: add links to the stable-baselines3 contrib
- Update docker image to use newer Ubuntu version
- Populate RL zoo
What is next? (for V1.1+)
- basic dict/tuple support for observations (Dictionary Observations #243 )
- simple recurrent policies? (recurrent policy implementation in ppo [feature-request] #18)
- DQN extensions (double, PER, IQN) ([Feature Request] RAINBOW #622)
- Implement TRPO (Add TRPO Stable-Baselines-Team/stable-baselines3-contrib#40)
- multi-worker training for all algorithms ([Feature request] Adding multiprocessing support for off policy algorithms #179 )
- n-step returns for off-policy algorithms [feature-request] N-step returns for TD methods #47 (@partiallytyped )
- SAC discrete [Feature request] Implement SAC-Discrete #157 (need to be discussed, benefit vs DQN+extensions?)
- Energy Based Prioritisation? (@RyanRizzo96)
- implement
action_proba
in the base class? - test the doc snippets Sphinx doc tests support #14 (help is welcomed)
- noisy networks (https://arxiv.org/abs/1706.10295) @partiallytyped ? exploration in parameter space? ([Feature Request] RAINBOW #622)
- Munchausen Reinforcement Learning (MDQN) (probably in the contrib first, e.g. [WIP] MDQN pfnet/pfrl#74)
side note: should we change the default start_method
to fork
? (now that we don't have tf anymore)