RL Tutorial on Stable Baselines (original) (raw)

www.dlr.de · Antonin RAFFIN · Stable Baselines Tutorial · JNRR 2019 · 18.10.2019

Stable Baselines Tutorial

Reinforcement Learning Made Easy

baymax

Ashley HILL
CEA

Edward Beeching
INSA Lyon

Antonin RAFFIN
German Aerospace Center (DLR)

Examples of Reinforcement Learning for Robotics

Learning Agile and Dynamic Motor Skills for Legged Robots (1)
Dexterous Manipulation
Learning to toss
Learning to Drive in Minutes
Wave RL

wave

Deep Mimic
Bonus

Stable Baselines Library

github repo

Features

features

Algorithms

RL Algos

Active Community

git clones

Tutorial

Github repo:https://github.com/araffin/rl-tutorial-jnrr19

  1. Getting Started Colab Notebook
  2. Gym Wrappers, saving and loading models Colab Notebook
  3. Multiprocessing Colab Notebook
  4. Callbacks and hyperparameter tuning Colab Notebook
  5. Creating a custom gym environment Colab Notebook

à vous de jouer!

https://github.com/araffin/rl-tutorial-jnrr19

Source: Deep Mimic (Jason Peng)

Branches of Machine Learning

taxonomy

Source: Outsider Tour RL by Ben Recht

Many Face of RL

many_faces

Source: David Silver Course

RL 101

rl101

Source: Lilian Weng blog

baymax

Credit: L.M Tenkes

Notation

Reinforcement Learning Classical Control
State sts_tst xtx_txt
Action ata_tat utu_tut
Reward rtr_trt −ct-c_tct

Main Components of an RL algo

An RL algo may include one or more of these components:

Model Free vs Model Based

Model Based RL

Model based RL

Source: BAIR blog

On-Policy vs Off-policy

Model Free RL Landscape

Model free RL landscape

Exploration vs Exploitation Trade-Off (1)

Exploration: Try a new beer

Exploitation: Drink your favorite beer

Exploration vs Exploitation Trade-Off (2)

Exploration: gather more information about the environment

Exploitation: use the best known strategy to maximize reward

Common Assumptions

Markov: the current state depends only on the previous step, not the complete history

Fully Observable: agent directly observe the environment state ($o_t = s_t$) Ex: Chess vs Poker

Recap

Current Challenges of RL

Topics not covered

Resources
RL Zoo: A collection of 120+ trained RL agents
  1. Provide a simple interface to train and enjoy RL agents
  2. Benchmark the different Reinforcement Learning algorithms
  3. Provide tuned hyperparameters for each environment and RL algorithm
  4. Have fun with the trained agents!

https://github.com/araffin/rl-baselines-zoo

RL Zoo: Training
         					`
         						HalfCheetahBulletEnv-v0:
         							env_wrapper: utils.wrappers.TimeFeatureWrapper
         							n_timesteps: !!float 2e6
         							policy: 'MlpPolicy'
         							gamma: 0.99
         							buffer_size: 1000000
         							noise_type: 'normal'
         							noise_std: 0.1
         							learning_starts: 10000
         							batch_size: 100
         							learning_rate: !!float 1e-3
         							train_freq: 1000
         							gradient_steps: 1000
         							policy_kwargs: 'dict(layers=[400, 300])'
         					`
         				
         					`
         						python train.py --algo td3 --env HalfCheetahBulletEnv-v0
         						python enjoy.py --algo td3 --env HalfCheetahBulletEnv-v0
         						python -m utils.record_video --algo td3 --env HalfCheetahBulletEnv-v0 -n 1000
         					`
         				
RL Zoo: Hyperparameter Optimization

Optuna