[feature-request] N-step returns for TD methods · Issue #47 · DLR-RM/stable-baselines3 (original) (raw)

Explore
- GitHub Sponsors Fund open source developers
- The ReadME Project GitHub community articles
- Enterprise platform AI-powered developer platform
Pricing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description

Originally posted by @partiallytyped in hill-a/stable-baselines#821
"
N-step returns allow for much better stability, and improve performance when training DQN, DDPG etc, so it will be quite useful to have this feature.

A simple implementation of this would be as a wrapper around ReplayBuffer so it would work with both Prioritized and Uniform sampling. The wrapper keeps a queue of observed experiences compute the returns and add the experience to the buffer.
"

Roadmap: v1.1+ (see #1 )

[feature-request] N-step returns for TD methods · Issue #47 · DLR-RM/stable-baselines3 (original) (raw)

Navigation Menu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description