Current Research I'm currently interested developing novel reinforcement learning algorithms that scale better to difficult problems, particularly involving large language models and interaction with humans.
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space Joey Hong ,Kang Liu ,Zhan Ling ,Jiecao Chen ,Sergey Levine under submission , 2025 arXiv code website
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL Joey Hong ,Anca Dragan ,Sergey Levine NeurIPS , 2025 arXiv website
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning Joey Hong ,Anca Dragan ,Sergey Levine ICLR , 2025 arXiv
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations Joey Hong ,Sergey Levine ,Anca Dragan NeurIPS Foundation Models for Decision Making Workshop , 2024 arXiv ,slides
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity Joey Hong ,Anca Dragan ,Sergey Levine ICLR , 2024 arXiv
Learning to Influence Human Behavior with Offline Reinforcement Learning Joey Hong ,Sergey Levine ,Anca Dragan NeurIPS , 2023 arXiv , website
Confidence-Conditioned Value Functions for Offline Reinforcement Learning Joey Hong ,Aviral Kumar ,Sergey Levine ICLR , 2022 (oral) arXiv
On the Sensitivity of Reward Inference to Misspecified Human Models Joey Hong ,Kush Bhatia ,Anca Dragan ICLR , 2022 (oral) arXiv
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning? Aviral Kumar* ,Joey Hong* ,Anikait Singh ,Sergey Levine ICLR , 2021 arXiv ,blog