SAC implementation is 2x slower than in stable-baselines · Issue #122 · DLR-RM/stable-baselines3 (original) (raw)

Hello,
First of all, thanks for working on this awesome project!
I've tried to use the SAC implementation and noticed that it works much slower than TF1 version from stable-baselines.
Here is the code for the minimal stable-baselines3 example:

import os

import gym import torch from stable_baselines3 import SAC from stable_baselines3.sac.policies import MlpPolicy

os.environ['CUDA_VISIBLE_DEVICES'] = ''

torch.set_num_threads(2)

env = gym.make('Pendulum-v0')

model = SAC(MlpPolicy, env, verbose=1, buffer_size=int(1e6), batch_size=256, policy_kwargs={'net_arch': [256, 256], 'activation_fn': torch.nn.ReLU}) model.learn(total_timesteps=1000000, log_interval=10)

Here is corresponding stable-baselines (TF1) example:

import os

import gym import tensorflow as tf from stable_baselines import SAC from stable_baselines.sac.policies import MlpPolicy

os.environ['CUDA_VISIBLE_DEVICES'] = ''

env = gym.make('Pendulum-v0')

model = SAC(MlpPolicy, env, verbose=1, buffer_size=int(1e6), batch_size=256, policy_kwargs={'layers': [256, 256], 'act_fun': tf.nn.relu}, n_cpu_tf_sess=2) model.learn(total_timesteps=1000000, log_interval=10)

I set the same architecture, number of updates, batch size. So seems all relevant stuff is set the same. However, for PyTorch version I get ~45 FPS, and for TF1 one ~90 FPS.

System Info
Libraries are installed from pip, I have the newest stable-baselines and stable-baselines3, pytorch 1.5.1, tensorflow 1.15.0. I run on CPU. This was run on MacBook pro, I also got similar results on another Linux machine.
Note that I also tried manipulating number of CPU cores, but even the best setting for PyTorch is still 2x slower.