Implement DQN by ndormann · Pull Request #28 · DLR-RM/stable-baselines3 (original) (raw)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, even though we could not completely reproduce the results on Atari games, it shows good results for classic gym environements (with tuned hyperparameters).
I suspect the difference may not come from the algorithm but from our pre-processing/feature extraction.

So, I would lean towards merging that one soon (as it brings also improvements for off-policy algorithms).
I have hyperparameter tuning for Atari still running though (I will see at the end what we can achieve).