Exploration in RL
Exploration strategies and intrinsic motivation
Top Papers
ToolRL: Reward is All Tool Learning Needs
Cheng Qian, Emre Can Acikgoz, Qi He et al.
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang, Guo Chen, Jilan Xu et al.
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Yao Zhang, Zijian Ma, Yunpu Ma et al.
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
Seohong Park, Oleh Rybkin, Sergey Levine
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu, Kangheng Lin, Liang Zhao et al.
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code
Maxence Faldor, Jenny Zhang, Antoine Cully et al.
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.
System 1.x: Learning to Balance Fast and Slow Planning with Language Models
Swarnadeep Saha, Archiki Prasad, Justin Chen et al.
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng, Yuzhen Huang, Lulu Zhao et al.
Online Preference Alignment for Language Models via Count-based Exploration
Chenjia Bai, Yang Zhang, Shuang Qiu et al.
Progress or Regress? Self-Improvement Reversal in Post-training
Ting Wu, Xuefeng Li, Pengfei Liu
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Bhavya, Stelian Coros, Andreas Krause et al.
Simulating Human-like Daily Activities with Desire-driven Autonomy
Yiding Wang, Yuxuan Chen, Fangwei Zhong et al.
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
XiangCheng Zhang, Fang Kong, Baoxiang Wang et al.
FoX: Formation-Aware Exploration in Multi-Agent Reinforcement Learning
Yonghyeon Jo, Sunwoo Lee, Junghyuk Yum et al.
Implicit Search via Discrete Diffusion: A Study on Chess
Jiacheng Ye, Zhenyu Wu, Jiahui Gao et al.
OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments
Jinyi Liu, Zhi Wang, Yan Zheng et al.
Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
Yanming Wan, Jiaxing Wu, Marwa Abdulhai et al.
MetaRLEC: Meta-Reinforcement Learning for Discovery of Brain Effective Connectivity
Zuozhen Zhang, Junzhong Ji, Jinduo Liu
Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning
Ge Li, Hongyi Zhou, Dominik Roth et al.
Skill Expansion and Composition in Parameter Space
Tenglong Liu, Jianxiong Li, Yinan Zheng et al.
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
Shaofei Cai, Zihao Wang, Kewei Lian et al.
Open-World Reinforcement Learning over Long Short-Term Imagination
Jiajian Li, Qi Wang, Yunbo Wang et al.
Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization
Zeyuan Ma, Jiacheng Chen, Hongshu Guo et al.
General Scene Adaptation for Vision-and-Language Navigation
Haodong Hong, Yanyuan Qiao, Sen Wang et al.
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
Grace Liu, Michael Tang, Benjamin Eysenbach
Delivering Inflated Explanations
Yacine Izza, Alexey Ignatiev, Peter Stuckey et al.
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun, Shanhui Zhao, Tao Yu et al.
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
Aaditya Singh, Ted Moskovitz, Sara Dragutinović et al.
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
Zun Wang, Jialu Li, Yicong Hong et al.
Unlocking the Power of Representations in Long-term Novelty-based Exploration
Alaa Saade, Steven Kapturowski, Daniele Calandriello et al.
Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing
Haobin Jiang, Ziluo Ding, Zongqing Lu
REvolve: Reward Evolution with Large Language Models using Human Feedback
RISHI HAZRA, Alkis Sygkounas, Andreas Persson et al.
Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding
Hongzhi Zang, Yulun Zhang, He Jiang et al.
Regret Analysis of Repeated Delegated Choice
Suho Shin, Keivan Rezaei, Mohammad Hajiaghayi et al.
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
Yangtao Chen, Zixuan Chen, Junhui Yin et al.
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Max Wilcoxson, Qiyang Li, Kevin Frans et al.
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
The Curse of Diversity in Ensemble-Based Exploration
Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin et al.
Reconciling Spatial and Temporal Abstractions for Goal Representation
Mehdi Zadem, Sergio Mover, Sao Mai Nguyen
REVECA: Adaptive Planning and Trajectory-Based Validation in Cooperative Language Agents Using Information Relevance and Relative Proximity
SeungWon Seo, SeongRae Noh, Junhyeok Lee et al.
When Maximum Entropy Misleads Policy Optimization
Ruipeng Zhang, Ya-Chien Chang, Sicun Gao
Learning to Navigate Efficiently and Precisely in Real Environments
Guillaume Bono, Hervé Poirier, Leonid Antsfeld et al.
Episodic Novelty Through Temporal Distance
Yuhua Jiang, Qihan Liu, Yiqin Yang et al.
UTILITY: Utilizing Explainable Reinforcement Learning to Improve Reinforcement Learning
Shicheng Liu, Minghui Zhu
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations
Zilin Wang, Haolin Zhuang, Lu Li et al.
Improving Large Language Model Planning with Action Sequence Similarity
Xinran Zhao, Hanie Sedghi, Bernd Bohnet et al.
The Effective Horizon Explains Deep RL Performance in Stochastic Environments
Cassidy Laidlaw, Banghua Zhu, Stuart Russell et al.
Horizon Generalization in Reinforcement Learning
Vivek Myers, Catherine Ji, Benjamin Eysenbach
Towards Improving Exploration through Sibling Augmented GFlowNets
Kanika Madan, Alex Lamb, Emmanuel Bengio et al.
Instruction-Augmented Long-Horizon Planning: Embedding Grounding Mechanisms in Embodied Mobile Manipulation
Fangyuan Wang, Shipeng Lyu, Peng Zhou et al.
Path Choice Matters for Clear Attributions in Path Methods
Borui Zhang, Wenzhao Zheng, Jie Zhou et al.
MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces
Loris Gaven, Thomas Carta, Clément Romac et al.
SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models
Cansu Sancaktar, Christian Gumbsch, Andrii Zadaianchuk et al.
ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning
Ruiyang Zhou, Shuozhe Li, Amy Zhang et al.
Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark
Bingchen Miao, Yang Wu, Minghe Gao et al.
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Kwan-Yee Lin, Stella X. Yu
State-Covering Trajectory Stitching for Diffusion Planners
Kyowoon Lee, Jaesik Choi
Learning Uncertainty-Aware Temporally-Extended Actions
Joongkyu Lee, Seung Joon Park, Yunhao Tang et al.
Episodic Return Decomposition by Difference of Implicitly Assigned Sub-trajectory Reward
Haoxin Lin, Hongqiu Wu, Jiaji Zhang et al.
LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models
Qianyue Hao, Yiwen Song, Qingmin Liao et al.
Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents
Junyan Liu, Lillian Ratliff
ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics
Letian Chen, Nina Moorman, Matthew Gombolay
Towards Synergistic Path-based Explanations for Knowledge Graph Completion: Exploration and Evaluation
Tengfei Ma, Xiang song, Wen Tao et al.
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling
Jasmine Bayrooti, Carl Ek, Amanda Prorok
State Entropy Regularization for Robust Reinforcement Learning
Yonatan Ashlag, Uri Koren, Mirco Mutti et al.
Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments
Riley Simmons-Edler, Ryan Badman, Felix Berg et al.
Explaining Decisions of Agents in Mixed-Motive Games
Maayan Orner, Oleg Maksimov, Akiva Kleinerman et al.
The impact of uncertainty on regularized learning in games
Pierre-Louis Cauvin, Davide Legacci, Panayotis Mertikopoulos
RLZero: Direct Policy Inference from Language Without In-Domain Supervision
Harshit Sushil Sikchi, Siddhant Agarwal, Pranaya Jajoo et al.
Reasoning in Visual Navigation of End-to-end Trained Agents: A Dynamical Systems Approach
Steeven JANNY, Hervé Poirier, Leonid Antsfeld et al.
Toward Efficient Multi-Agent Exploration With Trajectory Entropy Maximization
Tianxu Li, Kun Zhu
Task Planning for Object Rearrangement in Multi-Room Environments
Karan Mirakhor, Sourav Ghosh, Dipanjan Das et al.
Learning More Expressive General Policies for Classical Planning Domains
Simon Ståhlberg, Blai Bonet, Hector Geffner
Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration
Heyang Zhao, Xingrui Yu, David Bossens et al.
Risk-averse Total-reward MDPs with ERM and EVaR
Xihong Su, Marek Petrik, Julien Grand-Clément
Understanding Constraint Inference in Safety-Critical Inverse Reinforcement Learning
Bo Yue, Shufan Wang, Ashish Gaurav et al.
BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation
Oren Barkan, Yehonatan Elisha, Jonathan Weill et al.
Action abstractions for amortized sampling
Oussama Boussif, Léna Ezzine, Joseph Viviano et al.
Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration
Zijian Wang, Bin Wang, Haifeng Jing et al.
Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks
Ziping Xu, Zifan Xu, Runxuan Jiang et al.
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games
Runyu Lu, Peng Zhang, Ruochuan Shi et al.
Factorio Learning Environment
Jack Hopkins, Mart Bakler, Akbir Khan
Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents
Shuo Han, German Espinosa, Junda Huang et al.
OptionZero: Planning with Learned Options
Po-Wei Huang, Pei-Chiun Peng, Hung Guei et al.
Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping
Yining Li, Peizhong Ju, Ness Shroff
Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning
Cheng Chen, Yunpeng Zhai, Yifan Zhao et al.
Safety Representations for Safer Policy Learning
Kaustubh Mani, Vincent Mai, Charlie Gauthier et al.
KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies
Shih-Min Yang, Martin Magnusson, Johannes Stork et al.
Minimax Optimal Reinforcement Learning with Quasi-Optimism
Harin Lee, Min-hwan Oh
Action-Dependent Optimality-Preserving Reward Shaping
Grant Forbes, Jianxun Wang, Leonardo Villalobos-Arias et al.
Long-Term Experiences From Working with Extended Reality in the Wild
Verena Biener, Florian Jack Winston, Dieter Schmalstieg et al.
Behavioral Exploration: Learning to Explore via In-Context Adaptation
Andrew Wagenmaker, Zhiyuan Zhou, Sergey Levine
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory
Alexander Levine, Peter Stone, Amy Zhang
Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning
Ke Sun, Yingnan Zhao, Enze Shi et al.
Towards Empowerment Gain through Causal Structure Learning in Model-Based Reinforcement Learning
Hongye Cao, Fan Feng, Meng Fang et al.
Estimating cognitive biases with attention-aware inverse planning
Sounak Banerjee, Daphne Cornelisse, Deepak Gopinath et al.
MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning
Sizhe Tang, Jiayu Chen, Tian Lan
Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing
XianJun, Davin Choo, Yuqi Pan, Tonghan Wang et al.