[feature-request] N-step returns for TD methods · Issue #47 · DLR-RM/stable-baselines3 (original) (raw)
Navigation Menu
- Explore
- Pricing
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Description
Originally posted by @partiallytyped in hill-a/stable-baselines#821
"
N-step returns allow for much better stability, and improve performance when training DQN, DDPG etc, so it will be quite useful to have this feature.
A simple implementation of this would be as a wrapper around ReplayBuffer so it would work with both Prioritized and Uniform sampling. The wrapper keeps a queue of observed experiences compute the returns and add the experience to the buffer.
"
Roadmap: v1.1+ (see #1 )