Model-Based RL
RL with learned world models
Top Papers
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin, Zhelun Shi, Jiwen Yu et al.
Learning Interactive Real-World Simulators
Sherry Yang, Yilun Du, Seyed Ghasemipour et al.
TD-MPC2: Scalable, Robust World Models for Continuous Control
Nicklas Hansen, Hao Su, Xiaolong Wang
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang et al.
ToolRL: Reward is All Tool Learning Needs
Cheng Qian, Emre Can Acikgoz, Qi He et al.
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum et al.
Navigation World Models
Amir Bar, Gaoyue Zhou, Danny Tran et al.
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde, Victoriano Montesinos, Elvis Nava et al.
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Thomas Bush, Stephen Chung, Usman Anwar et al.
Towards Learning a Generalist Model for Embodied Navigation
Duo Zheng, Shijia Huang, Lin Zhao et al.
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Zehan Qi, Xiao Liu, Iat Long Iong et al.
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Jihan Yang, Runyu Ding, Weipeng DENG et al.
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Mingjie Liu, Shizhe Diao, Ximing Lu et al.
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Lunjun Zhang, Yuwen Xiong, Ze Yang et al.
OGBench: Benchmarking Offline Goal-Conditioned RL
Seohong Park, Kevin Frans, Benjamin Eysenbach et al.
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz, Aaditya Singh, DJ Strouse et al.
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
Seohong Park, Oleh Rybkin, Sergey Levine
HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation
Yi Li, Yuquan Deng, Jesse Zhang et al.
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae, Namyoung Kim, Kai Ong et al.
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu, Kangheng Lin, Liang Zhao et al.
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control
Guy Tevet, Sigal Raab, Setareh Cohan et al.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Danny Driess, Jost Springenberg, Brian Ichter et al.
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen, Yuqi Wang, Zhaoxiang Zhang
Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)
Qifeng Li, Xiaosong Jia, Shaobo Wang et al.
Learning 4D Embodied World Models
Haoyu Zhen, Qiao Sun, Hongxin Zhang et al.
Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
Harshit Sikchi, Qinqing Zheng, Amy Zhang et al.
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Yun Li, Yiming Zhang, Tao Lin et al.
SafeDreamer: Safe Reinforcement Learning with World Models
Weidong Huang, Jiaming Ji, Chunhe Xia et al.
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He, Weixi Feng, Kaizhi Zheng et al.
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.
Distilling Autoregressive Models to Obtain High-Performance Non-autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed
Yubin Xiao, Di Wang, Boyang Li et al.
System 1.x: Learning to Balance Fast and Slow Planning with Language Models
Swarnadeep Saha, Archiki Prasad, Justin Chen et al.
WorldModelBench: Judging Video Generation Models As World Models
Dacheng Li, Yunhao Fang, Yukang Chen et al.
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
Qingdong He, Jiangning Zhang, Jinlong Peng et al.
Long-Context State-Space Video World Models
Ryan Po, Yotam Nitzan, Richard Zhang et al.
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
Jusheng Zhang, Zimeng Huang, Yijia Fan et al.
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Yinmin Zhang, Jie Liu, Chuming Li et al.
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro et al.
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Duojun Huang, Xinyu Xiong, Jie Ma et al.
Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning
Haoqi Yuan, Zhancun Mu, Feiyang Xie et al.
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Zhenghao Peng, Wenjie Luo, Yiren Lu et al.
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
Dongping Chen, Yue Huang, Siyuan Wu et al.
Domain Prompt Learning with Quaternion Networks
Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
Desai Xie, Jiahao Li, Hao Tan et al.
Reinforced Lifelong Editing for Language Models
Zherui Li, Houcheng Jiang, Hao Chen et al.
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan, Rui Liu, Wenguan Wang et al.
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Nick Hansen, Jyothir S V, Vlad Sobal et al.
Efficient Reinforcement Learning with Large Language Model Priors
Xue Yan, Yan Song, Xidong Feng et al.
Locality Sensitive Sparse Encoding for Learning World Models Online
Zichen Liu, Chao Du, Wee Sun Lee et al.
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Hao Liang, Zhiquan Luo
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Bhavya, Stelian Coros, Andreas Krause et al.
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.
COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL
Xiyao Wang, Ruijie Zheng, Yanchao Sun et al.
Zero-shot forecasting of chaotic systems
Yuanzhao Zhang, William Gilpin
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
Hongyin Zhang, Pengxiang Ding, Shangke Lyu et al.
Stitching Sub-trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL
Sungyoon Kim, Yunseon Choi, Daiki Matsunaga et al.
Learning 3D Persistent Embodied World Models
Siyuan Zhou, Yilun Du, Yuncong Yang et al.
CORN: Contact-based Object Representation for Nonprehensile Manipulation of General Unseen Objects
Yoonyoung Cho, Junhyek Han, Yoontae Cho et al.
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tur et al.
Learning Optimal Advantage from Preferences and Mistaking It for Reward
W Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson et al.
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
Mianchu Wang, Rui Yang, Xi Chen et al.
Horizon Reduction Makes RL Scalable
Seohong Park, Kevin Frans, Deepinder Mann et al.
AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling
Zhining Zhang, Chuanyang Jin, Mung Yao Jia et al.
RoboScape: Physics-informed Embodied World Model
Yu Shang, Xin Zhang, Yinzhou Tang et al.
ReCoRe: Regularized Contrastive Representation Learning of World Model
Rudra P, K. Poudel, Harit Pandya et al.
Reinforcement Learning Friendly Vision-Language Model for Minecraft
Haobin Jiang, Junpeng Yue, Hao Luo et al.
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
Xiao Liang, Zhong-Zhi Li, Yeyun Gong et al.
Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals
Nate Gillman, Charles Herrmann, Michael Freeman et al.
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang, Dacheng Yin, Yizhou Zhou et al.
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
Yuncong Yang, Jiageng Liu, Zheyuan Zhang et al.
AdaWM: Adaptive World Model based Planning for Autonomous Driving
Hang Wang, Xin Ye, Feng Tao et al.
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Taewoong Kim, Cheolhong Min, Byeonghwi Kim et al.
TANGO: Training-free Embodied AI Agents for Open-world Tasks
Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Yongyuan Liang, Yanchao Sun, Ruijie Zheng et al.
Learning Transformer-based World Models with Contrastive Predictive Coding
Maxime Burchi, Radu Timofte
Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data
Chongyi Zheng, Benjamin Eysenbach, Homer Walke et al.
Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning
Ge Li, Hongyi Zhou, Dominik Roth et al.
Learning Decentralized Partially Observable Mean Field Control for Artificial Collective Behavior
Kai Cui, Sascha Hauck, Christian Fabian et al.
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Jiangran Lyu, Ziming Li, Xuesong Shi et al.
Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks
Yanqiao Zhu, Jeehyun Hwang, Keir Adams et al.
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
Open-World Reinforcement Learning over Long Short-Term Imagination
Jiajian Li, Qi Wang, Yunbo Wang et al.
Fast training and sampling of Restricted Boltzmann Machines
Nicolas BEREUX, Aurélien Decelle, Cyril Furtlehner et al.
Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning
Zizhao Wang, Caroline Wang, Xuesu Xiao et al.
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
Patrick Yin, Tyler Westenbroek, Ching-An Cheng et al.
Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling
Yitian Chen, Jingfan Xia, Siyu Shao et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
Random-Set Neural Networks
Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang et al.
ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning
Hongshu Guo, Zeyuan Ma, Jiacheng Chen et al.
Learning with a Mole: Transferable latent spatial representations for navigation without reconstruction
Guillaume Bono, Leonid Antsfeld, Assem Sadek et al.
DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing
Vint Lee, Pieter Abbeel, Youngwoon Lee
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
Jon Donnelly, Zhicheng Guo, Alina Jade Barnett et al.
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei, Yijun Yang, Junliang Xing et al.
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
Kangrui Wang, Pingyue Zhang, Zihan Wang et al.
Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
Tai Hoang, Huy Le, Philipp Becker et al.
Flow-Based Policy for Online Reinforcement Learning
Lei Lv, Yunfei Li, Yu Luo et al.
Learning World Models for Interactive Video Generation
Taiye Chen, Xun Hu, Zihan Ding et al.