Deep RL Workshop, NeurIPS 2020 (original) (raw)
- [pdf] [video] Amortized Variational Deep Q Network
- Haotian Zhang (Xi'an Jiaotong University); Yuhao Wang (Xi'an Jiaotong University); Jianyong Sun (Xi'an Jiaotong University)*; Zongben Xu (Xi'an Jiaotong University)
- [pdf] [video] DREAM: Deep Regret minimization with Advantage baselines and Model-free learning
- Eric Steinberger (Climate Science), Adam Lerer (Facebook AI Research), Noam Brown (Facebook AI Research)
- [pdf] [supplementary material] [video] Learning Functionally Decomposed Hierarchies for Continuous Control Tasks with Path Planning
- Sammy Christen (ETH Zurich); Lukas Jendele (ETH Zurich); Emre Aksan (ETH Zurich); Otmar Hilliges (ETH Zurich)
- [pdf] [video] Safety Aware Reinforcement Learning
- Santiago Miret (Intel Labs); Somdeb Majumdar (Intel Labs); Carroll Wainwright (Partnership on AI)
- [pdf] [video] PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards
- Prasoon Goyal (UT Austin); Scott Niekum (UT Austin); Raymond Mooney (UT Austin)
- [pdf] [video] Asymmetric self-play for automatic goal discovery in robotic manipulation
- OpenAI (OpenAI); Matthias Plappert (OpenAI); Raul Sampedro (OpenAI); Tao Xu (OpenAI); Ilge Akkaya (OpenAI); Vineet Kosaraju (OpenAI); Peter Welinder (OpenAI); Ruben D'Sa (OpenAI); Arthur Petron (OpenAI); Henrique Ponde (OpenAI); Alex Paino (OpenAI); Hyeonwoo Noh (OpenAI); Lilian Weng (OpenAI)*; Qiming Yuan (OpenAI); Casey Chu (OpenAI); Wojciech Zaremba (OpenAI)
- [pdf] [video] Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication
- Hardik Meisheri (TCS Research); Harshad Khadilkar (TCS Research)
- [pdf] [supplementary material] [video] Multi-task Reinforcement Learning with a Planning Quasi-Metric
- Vincent Micheli (EPFL); Karthigan Sinnathamby (EPFL); François Fleuret (University of Geneva)
- [pdf] [video] Disentangled Planning and Control in Vision Based Robotics via Reward Machines
- Alberto Camacho (Google); Jacob Varley (Google); Andy Zeng (Google); Deepali Jain (Google); Atil Iscen (Google); Dmitry Kalashnikov (Google)
- [pdf] [supplementary material] [video] Maximum Mutation Reinforcement Learning for Scalable Control
- Karush Suri (University of Toronto); Xiao Qi Shi (RBC Capital Markets); Konstantinos Plataniotis (University of Toronto); Yuri Lawryshyn (University of Toronto)
- [pdf] [supplementary material] [video] Energy-based Surprise Minimization for Multi-Agent Value Factorization
- Karush Suri (University of Toronto); Xiao Qi Shi (RBC Capital Markets); Konstantinos Plataniotis (University of Toronto); Yuri Lawryshyn (University of Toronto)
- [pdf] [supplementary material] [video] Correcting Momentum in Temporal Difference Learning
- Emmanuel Bengio (McGill University); Joelle Pineau (McGill / Facebook); Doina Precup (McGill University)
- [pdf] [supplementary material] [video] A Policy Gradient Method for Task-Agnostic Exploration
- Mirco Mutti (Politecnico di Milano, Università di Bologna); Lorenzo Pratissoli (Politecnico di Milano); Marcello Restelli (Politecnico di Milano)
- [pdf] [video] Dream and Search to Control: Latent Space Planning for Continuous Control
- Anurag Koul (Oregon State University); Varun Kumar (Intel AI Lab); Alan Fern (Oregon State University); Somdeb Majumdar (Intel Labs)
- [pdf] [video] Unsupervised Task Clustering for Multi-Task Reinforcement Learning
- Johannes Ackermann (Technical University of Munich); Oliver Richter (ETH Zurich); Roger Wattenhofer (ETH Zurich)
- [pdf] [video] Learning Intrinsic Symbolic Rewards in Reinforcement Learning
- Hassam Sheikh (University of Central Florida); Shauharda Khadka (Oregon State University); Santiago Miret (Intel Labs); Somdeb Majumdar (Intel Labs)
- [pdf] [video] Preventing Value Function Collapse in Ensemble Q-Learning by Maximizing Representation Diversity
- Hassam Sheikh (University of Central Florida); Ladislau Boloni (University of Central Florida)
- [pdf] [video] Quantifying Differences in Reward Functions
- Adam Gleave (UC Berkeley); Michael Dennis (UC Berkeley); Shane Legg (); Stuart Russell (UC Berkeley); Jan Leike (DeepMind)
- [pdf] [video] Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies
- Yunhao Tang (Columbia University); Krzysztof Choromanski (Google Brain Robotics)
- [pdf] [video] DERAIL: Diagnostic Environments for Reward And Imitation Learning
- Pedro Freire (Ecole Polytechnique); Adam Gleave (UC Berkeley); Sam Toyer (UC Berkeley); Stuart Russell (UC Berkeley)
- [pdf] [video] Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations
- Kalesha Bullard (Facebook AI Research); Franziska Meier (Facebook AI Research); Douwe Kiela (Facebook AI Research); Joelle Pineau (Facebook); Jakob Foerster (Facebook)
- [pdf] [video] On Effective Parallelization of Monte Carlo Tree Search
- Anji Liu (UCLA); Yitao Liang (UCLA); Ji Liu (Kwai Inc.); Guy Van den Broeck (UCLA); Jianshu Chen (Tencent AI Lab)
- [pdf] [video] Unlocking the Potential of Deep Counterfactual Value Networks
- Ryan Zarick (Minimal AI); Bryan Pellegrino (Minimal AI); Noam Brown (Facebook AI Research); Caleb Banister (Minimal AI)
- [pdf] [video] FactoredRL: Leveraging Factored Graphs for Deep Reinforcement Learning
- Bharathan Balaji (Amazon); Petros Christodoulou (Amazon); Xiaoyu Lu (Amazon); Byungsoo Jeon (Amazon); Jordan Bell-Masterson (Amazon)
- [pdf] [video] Reusability and Transferability of Macro Actions for Reinforcement Learning
- Yi Hsiang Chang (National Tsing Hua University); Kuan-Yu Chang (National Tsing Hua University); Henry Kuo (Harvard University); Chun-Yi Lee (National Tsing Hua University)
- [pdf] [video] Decoupling Exploration and Exploitation in Meta-Reinforcement Learning without Sacrifices
- Evan Liu (Stanford University); Aditi Raghunathan (Stanford University); Percy Liang (Stanford University); Chelsea Finn (Stanford)
- [pdf] Mastering Atari with Discrete World Models
- Danijar Hafner (Google); Timothy Lillicrap (DeepMind); Mohammad Norouzi (Google Research, Brain Team); Jimmy Ba (University of Toronto)
- [pdf] [video] Action and Perception as Divergence Minimization
- Danijar Hafner (Google); Pedro Ortega (DeepMind); Jimmy Ba (University of Toronto); Thomas Parr (University College London); Karl Friston (University College London); Nicolas Heess (DeepMind)
- [pdf] [video] Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
- Rishabh Agarwal (Google Research, Brain Team); Marlos C. Machado (Google Brain); Pablo Samuel Castro (Google); Marc G. Bellemare (Google Brain)
- [pdf] [video] Skill Transfer via Partially Amortized Hierarchical Planning
- Kevin Xie (University of Toronto); Homanga Bharadhwaj (University of Toronto, Vector Institute); Danijar Hafner (Google); Animesh Garg (University of Toronto, Vector Institute, Nvidia); Florian Shkurti (University of Toronto)
- [pdf] [video] Average Reward Reinforcement Learning with Monotonic Policy Improvement
- Yiming Zhang (New York University); Keith Ross (New York University Shanghai)
- [pdf] [video] Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
- Xinyue Chen (NYU Shanghai); Che Wang (New York University); Zijian Zhou (NYU Shanghai); Keith Ross (New York University Shanghai)
- [pdf] [video] Combating False Negatives in Adversarial Imitation Learning
- Konrad Żołna (Jagiellonian University); Chitwan Saharia (Indian Institute of Technology, Bombay); Léonard Boussioux (MIT, CentraleSupélec); David Yu-Tung Hui (Mila); Maxime Chevalier-Boisvert (Mila, Université de Montréal); Dzmitry Bahdanau (Element AI); Yoshua Bengio (Mila)
- [pdf] [video] Evaluating Agents Without Rewards
- Brendon Matusch; Jimmy Ba (University of Toronto); Danijar Hafner (Google)
- [pdf] [video] World Model as a Graph: Learning Latent Landmarks for Planning
- Lunjun Zhang (University of Toronto); Ge Yang (University of Chicago); Bradly Stadie (Vector Institute)
- [pdf] [video] Interactive Visualization for Debugging RL
- Shuby Deshpande (Carnegie Mellon University); Ben Eysenbach (Carnegie Mellon University); Jeff Schneider
- [pdf] [video] Conservative Safety Critics for Exploration
- Homanga Bharadhwaj (University of Toronto, Vector Institute); Aviral Kumar (UC Berkeley); Nicholas Rhinehart (UC Berkeley); Sergey Levine (UC Berkeley); Florian Shkurti (University of Toronto); Animesh Garg (University of Toronto, Vector Institute, Nvidia)
- [pdf] [video] D2RL: Deep Dense Architectures in Reinforcement Learning
- Samarth Sinha (University of Toronto, Vector Institute); Homanga Bharadhwaj (University of Toronto, Vector Institute); Aravind Srinivas (UC Berkeley); Animesh Garg (University of Toronto, Vector Institute, Nvidia)
- [pdf] [video] Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates
- Kimin Lee (UC Berkeley); Michael Laskin (UC Berkeley); Aravind Srinivas (UC Berkeley); Pieter Abbeel (UC Berkeley)
- [pdf] [video] Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms
- Chao Yu (Tsinghua University); Akash Velu (UC Berkeley); Eugene Vinitsky (UC Berkeley); Yu Wang (tsinghua university); Alexandre Bayen (University of California, Berkeley); Yi Wu (OpenAI)
- [pdf] [supplementary material] [video] A Deep Value-based Policy Search Approach for Real-world Vehicle Repositioning on Mobility-on-Demand Platforms
- Yan Jiao (Didi Research America); Xiaocheng Tang (DiDi AI Labs); ZHIWEI QIN (Didi Research America); Shuaiji Li (DiDi AI Labs); Fan Zhang (DiDi AI Labs); Hongtu Zhu (AI Labs, Didi Chuxing); Jieping Ye (Didi Chuxing)
- [pdf] [video] Solving Compositional Reinforcement Learning Problems via Task Reduction
- Yunfei Li (Tsinghua University); Huazhe Xu (UC Berkeley); Yilin Wu (Shanghai Qi Zhi Institute); Xiaolong Wang (UCSD); Yi Wu (OpenAI)
- [pdf] [video] Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization
- Zhenggang Tang (Peking University); Chao Yu (Tsinghua University); Boyuan Chen (UC Berkeley); Huazhe Xu (UC Berkeley); Xiaolong Wang (UCSD); Fei Fang (Carnegie Mellon University); Simon Du (University of Washington); Yu Wang (tsinghua university); Yi Wu (OpenAI)
- [pdf] [video] FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance
- Xiao-Yang Liu (Columbia University); Hongyang Yang (Columbia University); Qian Chen (Columbia University); Runjia Zhang (AI4Finance LLC); Liuqing Yang (Columbia University); Bowen Xiao (Imperial College); Christina Dan Wang (New York University)
- [pdf] [video] What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
- Marcin Andrychowicz (Google); Anton Raichuk (Google); Piotr Stańczyk (Google Brain); Manu Orsini (Google Brain); Sertan Girgin (Google Brain); Raphael Marinier (Google); Léonard Hussenot (Google Research, Brain Team); Matthieu Geist (Google Brain); Olivier Pietquin (Google Research - Brain Team); Marcin Michalski (Google); Sylvain Gelly (Google Brain); Olivier Bachem (Google Brain)
- [pdf] [video] Semantic State Representation for Reinforcement Learning
- Erez Schwartz (Technion); Guy Tennenholtz (Technion); Chen Tessler (Technion); Shie Mannor (Technion)
- [pdf] [video] Deep Q-Learning with Low Switching Cost
- Shusheng Xu (Tsinghua University); Simon Du (University of Washington); Yi Wu (OpenAI)
- [pdf] [supplementary material] [video] Diverse Exploration via InfoMax Options
- Yuji Kanagawa (The University of Tokyo); Tomoyuki Kaneko (The University of Tokyo)
- [pdf] [video] Hyperparameter Auto-tuning in Self-Supervised Robotic Learning
- Jiancong Huang (Guangdong University of Technology)
- [pdf] [video] Learning to Represent Action Values as a Hypergraph on the Action Vertices
- Arash Tavakoli (Imperial College London); Mehdi Fatemi (Microsoft Research); Petar Kormushev (Imperial College London)
- [pdf] [video] Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning
- Lin Guan (Arizona State University); Mudit Verma (Arizona State University); Sihang Guo (University of Texas at Austin); Ruohan Zhang (University of Texas at Austin); Subbarao Kambhampati (Arizona State University)
- [pdf] [video] Goal-Conditioned Reinforcement Learning in the Presence of an Adversary
- Carlos Purves (University of Cambridge); Pietro Liò (University of Cambridge); Cătălina Cangea (University of Cambridge)
- [pdf] [supplementary material] [video] Regularized Inverse Reinforcement Learning
- Wonseok Jeon (MILA, McGill University); Chen-Yang Su (MILA, McGill University); Paul Barde (MILA, McGill University); Thang Doan (Mila / McGill University); Derek Nowrouzezahrai (McGill University); Joelle Pineau (McGill / Facebook)
- [pdf] [video] Planning from Pixels using Inverse Dynamics Models
- Keiran Paster (University of Toronto); Sheila McIlraith (University of Toronto); Jimmy Ba (University of Toronto)
- [pdf] [video] Visual Imitation with Reinforcement Learning using Recurrent Siamese Networks
- Glen Berseth (University of California Berkeley); Florian Golemo (Mila, ElementAI); Chris Pal (MILA, Polytechnique Montréal, Element AI)
- [pdf] [video] Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning
- Nathan Lambert (UC Berkeley); Albert Wilcox (UC Berkeley); Howard Zhang (UC Berkeley); Kristofer Pister (UC Berkeley); Roberto Calandra (Facebook)
- [pdf] [video] XLVIN: eXecuted Latent Value Iteration Nets
- Andreea Deac (Mila, Université de Montréal); Petar Veličković (DeepMind); Ognjen Milinković (University of Belgrade); Pierre-Luc Bacon (Mila); Jian Tang (U Montreal); Mladen Nikolic (University of Belgrade)
- [pdf] Model-Based Meta-Reinforcement Learning for Flight with Suspended Payloads
- Suneel Belkhale (UC Berkeley); Rachel Li (University of California, Berkeley); Gregory Kahn (UC Berkeley); Rowan McAllister (UC Berkeley); Roberto Calandra (UC Berkeley)
- [pdf] [supplementary material] [video] Targeted Query-based Action-Space Adversarial Policies on Deep Reinforcement Learning Agents
- Xian Yeow Lee (Iowa State University); Yasaman Esfandiari (Iowa State University); Kai Liang Tan (Iowa State University); Soumik Sarkar (Iowa State University)
- [pdf] [video] Parrot: Data-driven Behavioral Priors for Reinforcement Learning
- Avi Singh (UC Berkeley); Huihan Liu (UC Berkeley ); Gaoyue Zhou (University of California, Berkeley); Albert Yu (UC Berkeley); Nick Rhinehart (); Sergey Levine (UC Berkeley)
- [pdf] [video] Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets
- Seunghyun Lee (KAIST); Younggyo Seo (KAIST); Kimin Lee (UC Berkeley); Pieter Abbeel (UC Berkeley); Jinwoo Shin (KAIST)
- [pdf] [video] C-Learning: Horizon-Aware Cumulative Accessibility Estimation
- Panteha Naderian (Layer 6 AI); Gabriel Loaiza-Ganem (Layer 6 AI); Harry Braviner (Layer 6 AI); Anthony Caterini (Layer 6 AI); Jesse Cresswell (Layer 6 AI); Tong Li (Layer 6 AI); Animesh Garg (University of Toronto, Vector Institute, Nvidia)
- [pdf] [video] Abstract Value Iteration for Hierarchical Deep Reinforcement Learning
- Kishor Jothimurugan (University of Pennsylvania); Osbert Bastani (University of Pennysylvania); Rajeev Alur (University of Pennsylvania )
- [pdf] [video] Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
- Aviral Kumar (UC Berkeley); Rishabh Agarwal (Google Research, Brain Team); Dibya Ghosh (UC Berkeley); Sergey Levine (UC Berkeley)
- [pdf] [video] Beyond Exponentially Discounted Sum: Automatic Learning of Return Function
- Yufei Wang (Carnegie Mellon University); Qiwei Ye (Microsoft); Tie-Yan Liu (Microsoft)
- [pdf] [video] TACTO: A Simulator for Learning Control from Touch Sensing
- Shaoxiong Wang (MIT); Mike Lambeta (Facebook); Po-Wei Chou (Facebook); Roberto Calandra (Facebook)
- [pdf] [video] XT2: Training an X-to-Text Typing Interface with Online Learning from Implicit Feedback
- Jensen Gao (UC Berkeley); Siddharth Reddy (UC Berkeley); Glen Berseth (University of California Berkeley); Anca Dragan (EECS Department, University of California, Berkeley); Sergey Levine (UC Berkeley)
- [pdf] [video] Safe Reinforcement Learning with Natural Language Constraints
- Tsung-Yen Yang (Princeton University); Michael Hu (Princeton University); Yinlam Chow (Google AI); Peter Ramadge (Princeton); Karthik Narasimhan (Princeton University)
- [pdf] [video] Compute- and Memory-Efficient Reinforcement Learning with Latent Experience Replay
- Lili Chen (UC Berkeley); Kimin Lee (UC Berkeley); Aravind Srinivas (); Pieter Abbeel (UC Berkeley)
- [pdf] [video] Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks
- Sungryull Sohn (University of Michigan); Sungtae Lee (Yonsei University); Jongwook Choi (University of Michigan); Honglak Lee (University of Michingan / Google Research); Harm van Seijen (Microsoft); Mehdi Fatemi (Microsoft Research)
- [pdf ] [video] Greedy Multi-Step Off-Policy Reinforcement Learning
- Yuhui Wang (Nanjing University of Aeronautics and Astronautics, China); Xiaoyang Tan (Nanjing University of Aeronautics and Astronautics, China)
- [pdf] [video] OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
- Anurag Ajay (MIT); Aviral Kumar (UC Berkeley); Pulkit Agrawal (UC Berkeley); Sergey Levine (Google); Ofir Nachum (Google)
- [pdf] [video] Emergent Road Rules In Multi-Agent Driving Environments
- Avik Pal (Indian Institute of Technology Kanpur); Jonah Philion (University of Toronto, NVIDIA); Yuan-Hong Liao (University of Toronto); Sanja Fidler (University of Toronto, NVIDIA)
- [pdf] [supplementary material] [video] Modularity in Reinforcement Learning: An Algorithmic Causality Perspective on Credit Assignment
- Michael Chang (UC Berkeley)*; Sid Kaushik (UC Berkeley)*; Sergey Levine (UC Berkeley); Tom Griffiths (Princeton)
- [pdf] [video] An Examination of Preference-based Reinforcement Learning for Treatment Recommendation
- Nan Xu (Univeristy of Southern California); Nitin Kamra (University of Southern California); Yan Liu (USC)
- [pdf] [supplementary material] [video] Learning to Weight Imperfect Demonstrations
- Yunke Wang (Wuhan University); Chang Xu (University of Sydney); Bo Du (Wuhan University); Honglak Lee (University of Michingan / Google Research)
- [pdf] [supplementary material] [video] Structure and randomness in planning and reinforcement learning
- Piotr Kozakowski (University of Warsaw); Piotr Januszewski (University of Warsaw & Gdansk University of Technology); Konrad Czechowski (University of Warsaw); Łukasz Kuciński (Polish Academy of Sciences); Piotr Miłoś (Polish Academy of Sciences)
- [pdf] [video] Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning
- Jongwook Choi (Google); Archit Sharma (Google); Sergey Levine (Google); Honglak Lee (Google / U. Michigan); Shixiang Gu (Google Brain)
- [pdf] [video] Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation
- Chenyang Zhao (University of Edinburgh); Timothy Hospedales (Edinburgh University)
- [pdf] [video] Data-Efficient Reinforcement Learning with Self-Predictive Representations
- Max Schwarzer (Mila, Université de Montréal); Ankesh Anand (MILA); Rishab Goel (Mila); R Devon Hjelm (Microsoft Research); Aaron Courville (Universite de Montreal); Philip Bachman (Microsoft Research)
- [pdf] [video] Accelerating Reinforcement Learning with Learned Skill Priors
- Karl Pertsch (University of Southern California); Youngwoon Lee (University of Southern California); Joseph Lim (USC)
- [pdf] [video] Model-based Navigation in Environments with Novel Layouts Using Abstract n-D Maps
- Linfeng Zhao (Northeastern University); Lawson Wong (Northeastern University)
- [pdf] [video] Parameter-based Value Functions
- Francesco Faccio (The Swiss AI Lab IDSIA); Louis Kirsch (Swiss AI Lab IDSIA); Jürgen Schmidhuber (IDSIA - Lugano)
- [pdf] [video] Online Safety Assurance for Deep Reinforcement Learning
- Noga Rotman (Hebrew University of Jerusalem); Michael Schapira (Hebrew University); Aviv Tamar (UC Berkeley)
- [pdf] [video] Lyapunov Barrier Policy Optimization
- Harshit Sikchi (Carnegie Mellon University); Wenxuan Zhou (Carnegie Mellon University); David Held (CMU)
- [pdf] [video] C-Learning: Learning to Achieve Goals via Recursive Classification
- Ben Eysenbach (Carnegie Mellon University); Ruslan Salakhutdinov (Carnegie Mellon University); Sergey Levine (UC Berkeley)
- [pdf] [video] Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
- Ben Eysenbach (Carnegie Mellon University); Shreyas Chaudhari (Carnegie Mellon University); Swapnil Asawa (University of Pittsburgh); Ruslan Salakhutdinov (Carnegie Mellon University); Sergey Levine (UC Berkeley)
- [pdf] [supplementary material] [video] Influence-aware Memory for Deep Reinforcement Learning in POMDPs
- Miguel Suau (Delft University of Technology); Elena Congeduti (Delft University of Technology); Jinke He (Delft University of Technology); Rolf Starre (Delft University of Technology); Aleksander Czechowski (TU Delft); Frans Oliehoek (TU Delft)
- [pdf] [video] Multi-Robot Deep Reinforcement Learning via Hierarchically Integrated Models
- Katie Kang (UC Berkeley); Gregory Kahn (UC Berkeley); Sergey Levine (University of California, Berkeley)
- [pdf] [video] ReaPER: Improving Sample Efficiency in Model-Based Latent Imagination
- Martin Bertran (Duke University); Guillermo Sapiro (Duke University); Mariano Phielipp (Intel AI Lab)
- [pdf] [video] Maximum Reward Formulation In Reinforcement Learning
- Sai Krishna Gottipati (99andBeyond); Yashaswi Pathak (International Institute of Information Technology,Hyderabad); Rohan Nuttall (University of Alberta); . Sahir (University of Alberta); Raviteja Chunduru (McGill University); Ahmed Touati (MILA); Sriram Ganapathi Subramanian (University of Waterloo ); Matthew Taylor (U. of Alberta); Sarath Chandar (Mila)
- [pdf] [video] How to make Deep RL work in Practice
- Nirnai Rao (Technical University of Munich); Elie Aljalbout (Technical University of Munich); Axel Sauer (University of Tuebingen); Sami Haddadin (Technical University of Munich)
- [pdf] [supplementary material] [video] Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning
- Florian Fuchs (Sony); Yunlong Song (ETH / University of Zurich); Elia Kaufmann (ETH / University of Zurich); Davide Scaramuzza (University of Zurich & ETH Zurich, Switzerland); Peter Dürr (Sony Europe)
- [pdf] [video] Model-Based Reinforcement Learning: A Compressed Survey
- Thomas Moerland (Delft University of Technology); Joost Broekens (Leiden University); Catholijn Jonker (Delft University of Technology)
- [pdf] [video] Evolving Reinforcement Learning Algorithms
- John Co-Reyes (UC Berkeley); Yingjie Miao (Google); Daiyi Peng (Google Brain); Quoc Le (Google Brain); Sergey Levine (Google); Honglak Lee (Google / U. Michigan); Aleksandra Faust (Google Brain)
- [pdf] [video] Learning to Reach Goals via Iterated Supervised Learning
- Dibya Ghosh (UC Berkeley); Abhishek Gupta (UC Berkeley); Ashwin Reddy (UC Berkeley); Justin Fu (UC Berkeley); Coline Devin (University of California, Berkeley); Ben Eysenbach (Carnegie Mellon University); Sergey Levine (UC Berkeley)
- [pdf] [video] Which Mutual-Information Representation Learning Objectives are Sufficient for Control?
- Kate Rakelly (UC Berkeley); Abhishek Gupta (UC Berkeley); Carlos Florensa (UC Berkeley); Sergey Levine (UC Berkeley)
- [pdf] [video] BeBold: Exploration Beyond the Boundary of Explored Regions
- Tianjun Zhang (UC Berkeley); Huazhe Xu (UC Berkeley); Xiaolong Wang (UCSD); Yi Wu (OpenAI); Kurt Keutzer (EECS, UC Berkeley); Joseph Gonzalez (UC Berkeley); Yuandong Tian (Facebook)
- [pdf] [supplementary material] [video] Curriculum Learning through Distilled Discriminators
- Rahul Siripurapu (USI); Louis Kirsch (Swiss AI Lab IDSIA); Jürgen Schmidhuber (IDSIA - Lugano)
- [pdf] [video] Chaining Behaviors from Data with Model-Free Reinforcement Learning
- Avi Singh (UC Berkeley); Albert Yu (UC Berkeley); Jonathan Yang (UC Berkeley); Aviral Kumar (UC Berkeley); Jesse Zhang (UC Berkeley); Sergey Levine (UC Berkeley)
- [pdf] [supplementary material] [video] Self-Supervised Policy Adaptation during Deployment
- Nicklas Hansen (Technical University of Denmark); Rishabh Jangir (University of California San Diego); Yu Sun (); Guillem Alenyà (IRI); Pieter Abbeel (UC Berkeley); Alexei Efros (UC Berkeley); Lerrel Pinto (New York University); Xiaolong Wang (UCSD)
- [pdf] [supplementary material] [video] Trust, but verify: model-based exploration in sparse reward environments
- Konrad Czechowski (University of Warsaw); TOMASZ ODRZYGÓŹDŹ (Polish Academy of Sciences); Michał Izworski (University of Warsaw); Marek Zbysiński (University of Warsaw); Lukasz Kucinski (IMPAN); Piotr Miłoś (Polish Academy of Sciences)
- [pdf] [video] Model-Based Visual Planning with Self-Supervised Functional Distances
- Stephen Tian (UC Berkeley); Suraj Nair (Stanford University); Frederik Ebert (UC Berkeley); Sudeep Dasari (Carnegie Mellon University); Ben Eysenbach (Carnegie Mellon University); Chelsea Finn (Stanford); Sergey Levine (UC Berkeley)
- [pdf] [video] A Unified View of Inference-based Off-Policy RL: Decoupling Algorithmic and Implementational Sources of Performance Differences
- Hiroki Furuta (The University of Tokyo); Tadashi Kozuno (Okinawa Institute of Science and Technology); Tatsuya Matsuhima (The University of Tokyo); Yutaka Matsuo (The University of Tokyo); Shixiang Gu (Google Brain)
- [pdf] [supplementary material] [video] Pairwise Weights for Temporal Credit Assignment
- Zeyu Zheng (University of Michigan); Risto Vuorio (University of Oxford); Richard Lewis (University of Michigan); Satinder Singh (UMich)
- [pdf] [video] Learning to Sample with Local and Global Contexts in Experience Replay Buffer
- Youngmin Oh (Samsung Advanced Institute of Technology); Kimin Lee (UC Berkeley); Jinwoo Shin (KAIST); Eunho Yang (KAIST;AITRICS); Sung Ju Hwang (KAIST, AITRICS)
- [pdf] [video] Adversarial Environment Generation for Learning to Navigate the Web
- Izzeddin Gur (Google); Natasha Jaques (UC Berkeley); Kevin Malta (Google); Manoj Tiwari (Google); Aleksandra Faust (Google Brain); Honglak Lee (Google / U. Michigan); Aleksandra Faust (Google Brain);
- [pdf] [video] Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
- Shauharda Khadka (Intel Labs); Estelle Guez Aflalo (Intel Corp); Mattias Marder (Intel Corp); Avrech Ben-David (Technion); Santiago Miret (Intel Labs); Shie Mannor (Technion); Tamir Hazan (Technion); Hanlin Tang (Intel Corporation); Somdeb Majumdar (Intel Labs)*
- [pdf] [video] Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning
- Sumedh Sontakke (University of Southern California); Arash Mehrjou (Mr.); Laurent Itti (University of Southern California); Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)
- [pdf] [video] Optimizing Traffic Bottleneck Throughput using Cooperative, Decentralized Autonomous Vehicles
- Eugene Vinitsky (UC Berkeley); Nathan Lichtle (ENS Paris-Saclay); Kanaad Parvate (UC Berkeley); Alexandre Bayen (University of California, Berkeley)
- [pdf] [video] Reset-Free Lifelong Learning with Skill-Space Planning
- Kevin Lu (UC Berkeley); Aditya Grover (Stanford University); Pieter Abbeel (UC Berkeley); Igor Mordatch (Google)
- [pdf] [video] Mirror Descent Policy Optimization
- Manan Tomar (Facebook AI Research); Lior Shani (Technion); Yonathan Efroni (Microsoft Research); Mohammad Ghavamzadeh (Google Research)
- [pdf] [video] Utilizing Skipped Frames in Action Repeats via Pseudo-Actions
- Taisei Hashimoto (The University of Tokyo); Yoshimasa Tsuruoka (The University of Tokyo)
- [pdf] [video] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking
- Fabio Pardo (Imperial College London)
- [pdf] [video] Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research
- Johan Obando Ceron (UAO); Pablo Samuel Castro (Google)
- [pdf] [video] MaxEnt RL and Robust Control
- Ben Eysenbach (Carnegie Mellon University); Sergey Levine (UC Berkeley)
- [pdf] Bringing order into Actor-Critic Algorithms using Stackelberg Games
- Robert Müller (Technical University of Munich)
- [pdf] [video] Reinforcement Learning with Latent Flow
- Wenling Shang (University of Amsterdam); Xiaofei Wang (University of California, Berkeley); Aravind Rajeswaran (University of Washington); Aravind Srinivas (UC Berkeley)*; Yang Gao (UC Berkeley); Michael Laskin (UC Berkeley)
- [pdf] [video] Understanding Learned Reward Functions
- Eric Michaud (University of California, Berkeley); Adam Gleave (University of California, Berkeley); Stuart Russell (UC Berkeley)
- [pdf] [video] Addressing reward bias in Adversarial Imitation Learning with neutral reward functions
- Rohit Jena (Carnegie Mellon University); Siddharth Agrawal (Carnegie Mellon University); Katia Sycara (Carnegie Mellon University)
- [pdf] [video] Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environments
- Wilka Carvalho (University of Michigan); Anthony Liang (University of Michigan); Kimin Lee (UC Berkeley); Sungryull Sohn (University of Michigan); Honglak Lee (University of Michingan / Google Research); Richard Lewis (University of Michigan); Satinder Singh (UMich)
- [pdf] [supplementary material] [video] Efficient Competitive Self-Play Policy Optimization
- Yuanyi Zhong (University of Illinois at Urbana-Champaign); Yuan Zhou (UIUC); Jian Peng (University of Illinois at Urbana-Champaign)
- [pdf] [video] Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
- Michael Zhang (University of Toronto ); Thomas Paine (DeepMind); Ofir Nachum (Google); Cosmin Paduraru (DeepMind); George Tucker (Google Brain); Ziyu Wang (Google Research, Brain Team); Mohammad Norouzi (Google Research, Brain Team)
- [pdf] [video] Reinforcement Learning with Bayesian Classifiers: Efficient Skill Learning from Outcome Examples
- Kevin Li (UC Berkeley); Abhishek Gupta (UC Berkeley); Vitchyr Pong (UC Berkeley); Ashwin Reddy (UC Berkeley); Aurick Zhou (UC Berkeley); Justin Yu (RAIL); Sergey Levine (UC Berkeley)
- [pdf] Decoupling Representation Learning from Reinforcement Learning
- Adam Stooke (UC Berkeley); Kimin Lee (UC Berkeley); Michael Laskin (UC Berkeley)
- [pdf] [video] AWAC: Accelerating Online Reinforcement Learning With Offline Datasets
- Ashvin Nair (UC Berkeley); Murtaza Dalal (Carnegie Mellon University); Abhishek Gupta (UC Berkeley); Sergey Levine (UC Berkeley)
- [pdf] [video] Inter-Level Cooperation in Hierarchical Reinforcement Learning
- Abdul Rahman Kreidieh (UC Berkeley); Glen Berseth (University of California Berkeley); Brandon Trabucco (UC Berkeley); Samyak Parajuli (University of California, Berkeley); Sergey Levine (UC Berkeley); Alexandre Bayen (UC Berkeley)
- [pdf] [video] Model-Based Reinforcement Learning via Latent-Space Collocation
- Oleh Rybkin (University of Pennsylvania); Chuning Zhu (University of Pennsylvania); Anusha Nagabandi (UC Berkeley); Kostas Daniilidis (University of Pennsylvania); Igor Mordatch (OpenAI); Sergey Levine (University of California, Berkeley)
- [pdf] [video] Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning
- Haotian Fu (Tianjin University); Hongyao Tang (Tianjin University); Jianye Hao (Huawei Noah's Ark Lab); Chen Chen (Huawei Noah’s Ark Lab); XIDONG FENG (Department of Automation,Tsinghua University; Huawei Noah ark's Lab); Dong Li ( Huawei Noah's Ark Lab); Wulong Liu (Huawei Noah's Ark Lab)
- [pdf] [video] PettingZoo: Gym for Multi-Agent Reinforcement Learning
- J. K. Terry (Swarm Labs); Benjamin Black (Swarm Labs); Mario Jayakumar (University of Maryland, College Park); Ananth Hari (University of Maryland, College Park); Luis Santos (Swarm Labs); Clemens Dieffendahl (Technical University of Berlin); Niall Williams (University of Maryland, College Park); Yashas Lokesh (University of Maryland, College Park); Caroline Horsch (University of Maryland, College Park); Praveen Ravi (University of Maryland, College Park); Ryan Sullivan (Swarm Labs)
- [pdf] [video] DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies
- Soroush Nasiriany (UC Berkeley); Vitchyr Pong (UC Berkeley); Ashvin Nair (UC Berkeley); Khazatsky Alexander (UC Berkeley); Glen Berseth (UC Berkeley); Sergey Levine (UC Berkeley)
- [pdf] Multi-Agent Option Critic Architecture
- Abhinav Gupta (Mila); Jhelum Chakravorty (McGill University); Jikun Kang (McGill University); Xue Liu (McGill University); Doina Precup (McGill University)
- [pdf] [video] Measuring Visual Generalization in Continuous Control from Pixels
- Jake Grigsby (University of Virginia); Yanjun Qi (University of Virginia)
- [pdf] [video] Provably Efficient Policy Optimization via Thompson Sampling
- Haque Ishfaq (Mila, McGill University); Zhuoran Yang (Princeton.edu); Andrei Lupu (Mila, McGill University); Viet Nguyen (Mila, McGill University); Lewis Liu (University of Montreal, Mila); Riashat Islam (MILA, Mcgill University); Zhaoran Wang (Northwestern); Doina Precup (McGill University)
- [pdf] [video] Outcome-Driven Reinforcement Learning via Variational Inference
- Tim G. J. Rudner (University of Oxford); Vitchyr Pong (UC Berkeley); Rowan McAllister (UC Berkeley); Yarin Gal (University of Oxford); Sergey Levine (UC Berkeley)
- [pdf] [video] Policy Learning Using Weak Supervision
- Jingkang Wang (Uber ATG, University of Toronto); Hongyi Guo (Shanghai Jiao Tong University); Zhaowei Zhu (UC Santa Cruz); Yang Liu (UC Santa Cruz)
- [pdf] [supplementary material] [video] Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments
- Jun Yamada (University of Southern California); Youngwoon Lee (University of Southern California); Gautam Salhotra (University of Southern California); Karl Pertsch (University of Southern California); Max Pflueger (University of Southern California); Gaurav Sukhatme (University of Southern California); Joseph Lim (USC); Peter Englert (University of Southern California)
- [pdf] [video] Discovery of Options via Meta-Gradients
- Vivek Veeriah (University of Michigan); Tom Zahavy (DeepMind); Matteo Hessel (DeepMind); Zhongwen Xu (DeepMind); Junhyuk Oh (DeepMind); Iurii Kemaev (Deepmind); Hado van Hasselt (DeepMind); David Silver (DeepMind); Satinder Singh (DeepMind)
- [pdf] [supplementary material] [video] SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II
- Xiangjun Wang (inspir.ai); Junxiao SONG (inspir.ai)
- [pdf] [supplementary material] [video] Unsupervised Domain Adaptation for Visual Navigation
- Shangda Li (Carnegie Mellon University); Devendra Singh Chaplot (Carnegie Mellon University); Yao-Hung Tsai (Carnegie Mellon University); Yue Wu (Carnegie Mellon University); Louis-Philippe Morency (Carnegie Mellon University); Ruslan Salakhutdinov (Carnegie Mellon University)
- [pdf] [video] Continual Model-Based Reinforcement Learning with Hypernetworks
- Yizhou Huang (University of Toronto); Kevin Xie (University of Toronto); Homanga Bharadhwaj (University of Toronto, Vector Institute); Florian Shkurti (University of Toronto)
- [pdf] [supplementary material] [video] GRAC: Self-Guided and Self-Regularized Actor-Critic
- Lin Shao (Stanford University); Yifan You (UCLA); Mengyuan Yan (Stanford University); Qingyun Sun (Stanford university); Jeannette Bohg (Stanford)
- [pdf] [video] Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity
- Tanmay Gangwani (UIUC); Jian Peng (UIUC); Yuan Zhou (UIUC)
- [pdf] [video] R-LAtte: Visual Control via Deep Reinforcement Learning with Attention Network
- Mandi Zhao (UC Berkeley); Qiyang Li (University of California, Berkeley); Aravind Srinivas (); Ignasi Clavera (UC Berkeley); Kimin Lee (UC Berkeley); Pieter Abbeel (UC Berkeley)
- [pdf] [video] Domain Adversarial Reinforcement Learning
- Bonnie Li (McGill); Vincent Francois-Lavet (McGill); Thang Doan (Mila / McGill); Joelle Pineau (McGill / Facebook)
- [pdf] [supplementary material] [video] Latent State Models for Meta-Reinforcement Learning from Images
- Anusha Nagabandi (UC Berkeley); Zihao Zhao (UC Berkeley); Kate Rakelly (UC Berkeley); Chelsea Finn (Stanford); Sergey Levine (UC Berkeley)
- [pdf] [video] Learning Markov State Abstractions for Deep Reinforcement Learning
- Cameron Allen (Brown University); Neev Parikh (Brown University); George Konidaris (Brown)
- [pdf] [video] Backtesting Optimal Trade Execution Policies in Agent-Based Market Simulator
- Siyu Lin (University of Virginia); Peter Beling (University of Virginia)
- [pdf] [supplementary material] [video] Deep Bayesian Quadrature Policy Optimization
- Ravi Tej Akella (Indian Institute of Technology Roorkee); Kamyar Azizzadenesheli (Purdue University); Mohammad Ghavamzadeh (Google Research); Animashree Anandkumar (Caltech); Yisong Yue (Caltech)
- [pdf] [video] Predictive PER: Balancing Priority and Diversity towards Stable Deep Reinforcement Learning
- Sanghwa Lee (National Institute of Informatics); Jaeyoung Lee (University of Waterloo); Ichiro Hasuo (National Institute of Informatics & SOKENDAI)
- [pdf] [supplementary material] [video] Value Generalization among Policies: Improving Value Function with Policy Representation
- Hongyao Tang (Tianjin University); Zhaopeng Meng (School of Computer Software, Tianjin University); Jianye Hao (Tianjin University); Chen Chen (Huawei Noah’s Ark Lab); Daniel Graves (Huawei); Dong Li (Huawei Noah's Ark Lab); Wulong Liu (Huawei Noah's Ark Lab); Yaodong Yang (Huawei Noah's Ark Lab)
- [pdf] [supplementary material][video] Successor Landmarks for Efficient Exploration and Long-Horizon Navigation
- Christopher Hoang (University of Michigan); Sungryull Sohn (University of Michigan); Jongwook Choi (University of Michigan); Wilka Carvalho (University of Michigan); Honglak Lee (University of Michingan / Google Research)
- [pdf] [video] Policy Guided Planning in Learned Latent Space
- Mohammad Amini (Mila, McGill University); Doina Precup (McGill University); Sarath Chandar (Mila)