🧬Reinforcement Learning

Exploration in RL

Exploration strategies and intrinsic motivation

100 papers1,131 total citations
Compare with other topics
Feb '24 Jan '26314 papers
Also includes: exploration, curiosity-driven, intrinsic motivation, exploration bonus

Top Papers

#1

ToolRL: Reward is All Tool Learning Needs

Cheng Qian, Emre Can Acikgoz, Qi He et al.

NeurIPS 2025
152
citations
#2

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

Yifei Huang, Guo Chen, Jilan Xu et al.

CVPR 2024
84
citations
#3

WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Yao Zhang, Zijian Ma, Yunpu Ma et al.

AAAI 2025
74
citations
#4

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.

ICLR 2025
70
citations
#5

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Seohong Park, Oleh Rybkin, Sergey Levine

ICLR 2024
68
citations
#6

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

En Yu, Kangheng Lin, Liang Zhao et al.

NeurIPS 2025arXiv:2504.07954
58
citations
#7

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Maxence Faldor, Jenny Zhang, Antoine Cully et al.

ICLR 2025
44
citations
#8

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.

NeurIPS 2025
31
citations
#9

System 1.x: Learning to Balance Fast and Slow Planning with Language Models

Swarnadeep Saha, Archiki Prasad, Justin Chen et al.

ICLR 2025
31
citations
#10

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Weihao Zeng, Yuzhen Huang, Lulu Zhao et al.

ICLR 2025
23
citations
#11

Online Preference Alignment for Language Models via Count-based Exploration

Chenjia Bai, Yang Zhang, Shuang Qiu et al.

ICLR 2025
19
citations
#12

Progress or Regress? Self-Improvement Reversal in Post-training

Ting Wu, Xuefeng Li, Pengfei Liu

ICLR 2025
18
citations
#13

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Bhavya, Stelian Coros, Andreas Krause et al.

ICLR 2025
18
citations
#14

Simulating Human-like Daily Activities with Desire-driven Autonomy

Yiding Wang, Yuxuan Chen, Fangwei Zhong et al.

ICLR 2025
14
citations
#15

Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

XiangCheng Zhang, Fang Kong, Baoxiang Wang et al.

ICLR 2025
14
citations
#16

FoX: Formation-Aware Exploration in Multi-Agent Reinforcement Learning

Yonghyeon Jo, Sunwoo Lee, Junghyuk Yum et al.

AAAI 2024arXiv:2308.11272
multi-agent reinforcement learningpartial observabilityexploration space scalabilityformation-based equivalence+4
14
citations
#17

Implicit Search via Discrete Diffusion: A Study on Chess

Jiacheng Ye, Zhenyu Wu, Jiahui Gao et al.

ICLR 2025
13
citations
#18

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Jinyi Liu, Zhi Wang, Yan Zheng et al.

AAAI 2024arXiv:2312.12145
reinforcement learningoptimistic explorationcontinuous controlenvironmental stochasticity+3
13
citations
#19

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Yanming Wan, Jiaxing Wu, Marwa Abdulhai et al.

NeurIPS 2025arXiv:2504.03206
personalized dialogue systemsmulti-turn reinforcement learningcuriosity reward mechanismuser modeling+4
12
citations
#20

MetaRLEC: Meta-Reinforcement Learning for Discovery of Brain Effective Connectivity

Zuozhen Zhang, Junzhong Ji, Jinduo Liu

AAAI 2024
11
citations
#21

Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning

Ge Li, Hongyi Zhou, Dominik Roth et al.

ICLR 2024
11
citations
#22

Skill Expansion and Composition in Parameter Space

Tenglong Liu, Jianxiong Li, Yinan Zheng et al.

ICLR 2025
11
citations
#23

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

Shaofei Cai, Zihao Wang, Kewei Lian et al.

CVPR 2025
11
citations
#24

Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li, Qi Wang, Yunbo Wang et al.

ICLR 2025
10
citations
#25

Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization

Zeyuan Ma, Jiacheng Chen, Hongshu Guo et al.

ICLR 2025arXiv:2408.10672
meta-black-box optimizationexploratory landscape analysisattention-based neural networkmulti-task neuroevolution+3
10
citations
#26

General Scene Adaptation for Vision-and-Language Navigation

Haodong Hong, Yanyuan Qiao, Sen Wang et al.

ICLR 2025
10
citations
#27

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals

Grace Liu, Michael Tang, Benjamin Eysenbach

ICLR 2025arXiv:2408.05804
contrastive reinforcement learningskill emergencedirected explorationreward-free learning+2
10
citations
#28

Delivering Inflated Explanations

Yacine Izza, Alexey Ignatiev, Peter Stuckey et al.

AAAI 2024arXiv:2306.15272
explainable artificial intelligenceformal abductive explanationsfeature value analysisdecision boundary analysis+4
10
citations
#29

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration

Yuchen Sun, Shanhui Zhao, Tao Yu et al.

CVPR 2025
10
citations
#30

Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

Aaditya Singh, Ted Moskovitz, Sara Dragutinović et al.

ICML 2025
9
citations
#31

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Zun Wang, Jialu Li, Yicong Hong et al.

ICLR 2025
9
citations
#32

Unlocking the Power of Representations in Long-term Novelty-based Exploration

Alaa Saade, Steven Kapturowski, Daniele Calandriello et al.

ICLR 2024
9
citations
#33

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

Haobin Jiang, Ziluo Ding, Zongqing Lu

AAAI 2024arXiv:2402.02097
multi-agent reinforcement learningcoordinated explorationnovelty sharingdecentralized cooperation+4
8
citations
#34

REvolve: Reward Evolution with Large Language Models using Human Feedback

RISHI HAZRA, Alkis Sygkounas, Andreas Persson et al.

ICLR 2025
8
citations
#35

Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding

Hongzhi Zang, Yulun Zhang, He Jiang et al.

AAAI 2025
8
citations
#36

Regret Analysis of Repeated Delegated Choice

Suho Shin, Keivan Rezaei, Mohammad Hajiaghayi et al.

AAAI 2024arXiv:2310.04884
repeated delegated choiceonline learning variantregret analysisstrategic agent behavior+4
7
citations
#37

GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation

Yangtao Chen, Zixuan Chen, Junhui Yin et al.

ICLR 2025
7
citations
#38

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Max Wilcoxson, Qiyang Li, Kevin Frans et al.

ICML 2025
7
citations
#39

Among Us: A Sandbox for Measuring and Detecting Agentic Deception

Satvik Golechha, Adrià Garriga-Alonso

NeurIPS 2025arXiv:2504.04072
agentic deceptionlanguage-based ai agentssocial deception gamemulti-player game+4
7
citations
#40

The Curse of Diversity in Ensemble-Based Exploration

Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin et al.

ICLR 2024
6
citations
#41

Reconciling Spatial and Temporal Abstractions for Goal Representation

Mehdi Zadem, Sergio Mover, Sao Mai Nguyen

ICLR 2024
6
citations
#42

REVECA: Adaptive Planning and Trajectory-Based Validation in Cooperative Language Agents Using Information Relevance and Relative Proximity

SeungWon Seo, SeongRae Noh, Junhyeok Lee et al.

AAAI 2025
6
citations
#43

When Maximum Entropy Misleads Policy Optimization

Ruipeng Zhang, Ya-Chien Chang, Sicun Gao

ICML 2025
6
citations
#44

Learning to Navigate Efficiently and Precisely in Real Environments

Guillaume Bono, Hervé Poirier, Leonid Antsfeld et al.

CVPR 2024
5
citations
#45

Episodic Novelty Through Temporal Distance

Yuhua Jiang, Qihan Liu, Yiqin Yang et al.

ICLR 2025
5
citations
#46

UTILITY: Utilizing Explainable Reinforcement Learning to Improve Reinforcement Learning

Shicheng Liu, Minghui Zhu

ICLR 2025
5
citations
#47

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

Zilin Wang, Haolin Zhuang, Lu Li et al.

AAAI 2024arXiv:2312.11442
3d dance generationreward model trainingreinforcement learningmusic-conditioned generation+4
5
citations
#48

Improving Large Language Model Planning with Action Sequence Similarity

Xinran Zhao, Hanie Sedghi, Bernd Bohnet et al.

ICLR 2025arXiv:2505.01009
in-context learningexemplar selectionaction sequence similaritylarge language model planning+3
5
citations
#49

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

Cassidy Laidlaw, Banghua Zhu, Stuart Russell et al.

ICLR 2024
5
citations
#50

Horizon Generalization in Reinforcement Learning

Vivek Myers, Catherine Ji, Benjamin Eysenbach

ICLR 2025arXiv:2501.02709
goal-conditioned reinforcement learninghorizon generalizationplanning invariancegoal-directed policies+2
5
citations
#51

Towards Improving Exploration through Sibling Augmented GFlowNets

Kanika Madan, Alex Lamb, Emmanuel Bengio et al.

ICLR 2025
5
citations
#52

Instruction-Augmented Long-Horizon Planning: Embedding Grounding Mechanisms in Embodied Mobile Manipulation

Fangyuan Wang, Shipeng Lyu, Peng Zhou et al.

AAAI 2025
5
citations
#53

Path Choice Matters for Clear Attributions in Path Methods

Borui Zhang, Wenzhao Zheng, Jie Zhou et al.

ICLR 2024
4
citations
#54

MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces

Loris Gaven, Thomas Carta, Clément Romac et al.

ICML 2025
4
citations
#55

SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models

Cansu Sancaktar, Christian Gumbsch, Andrii Zadaianchuk et al.

ICML 2025
4
citations
#56

ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

Ruiyang Zhou, Shuozhe Li, Amy Zhang et al.

NeurIPS 2025
4
citations
#57

Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark

Bingchen Miao, Yang Wu, Minghe Gao et al.

ICML 2025
4
citations
#58

Let Humanoids Hike! Integrative Skill Development on Complex Trails

Kwan-Yee Lin, Stella X. Yu

CVPR 2025arXiv:2505.06218
humanoid robot locomotionhierarchical reinforcement learningtemporal vision transformerprivileged learning+4
4
citations
#59

State-Covering Trajectory Stitching for Diffusion Planners

Kyowoon Lee, Jaesik Choi

NeurIPS 2025
4
citations
#60

Learning Uncertainty-Aware Temporally-Extended Actions

Joongkyu Lee, Seung Joon Park, Yunhao Tang et al.

AAAI 2024arXiv:2402.05439
reinforcement learningtemporal abstractionaction repetitionuncertainty estimation+3
3
citations
#61

Episodic Return Decomposition by Difference of Implicitly Assigned Sub-trajectory Reward

Haoxin Lin, Hongqiu Wu, Jiaji Zhang et al.

AAAI 2024arXiv:2312.10642
delayed rewardsepisodic return decompositionreinforcement learningsample efficiency+2
3
citations
#62

LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models

Qianyue Hao, Yiwen Song, Qingmin Liao et al.

NeurIPS 2025
3
citations
#63

Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents

Junyan Liu, Lillian Ratliff

ICML 2025
3
citations
#64

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

Letian Chen, Nina Moorman, Matthew Gombolay

ICML 2025
3
citations
#65

Towards Synergistic Path-based Explanations for Knowledge Graph Completion: Exploration and Evaluation

Tengfei Ma, Xiang song, Wen Tao et al.

ICLR 2025
3
citations
#66

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Jasmine Bayrooti, Carl Ek, Amanda Prorok

ICLR 2025
3
citations
#67

State Entropy Regularization for Robust Reinforcement Learning

Yonatan Ashlag, Uri Koren, Mirco Mutti et al.

NeurIPS 2025
3
citations
#68

Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments

Riley Simmons-Edler, Ryan Badman, Felix Berg et al.

NeurIPS 2025
3
citations
#69

Explaining Decisions of Agents in Mixed-Motive Games

Maayan Orner, Oleg Maksimov, Akiva Kleinerman et al.

AAAI 2025
3
citations
#70

The impact of uncertainty on regularized learning in games

Pierre-Louis Cauvin, Davide Legacci, Panayotis Mertikopoulos

ICML 2025
3
citations
#71

RLZero: Direct Policy Inference from Language Without In-Domain Supervision

Harshit Sushil Sikchi, Siddhant Agarwal, Pranaya Jajoo et al.

NeurIPS 2025
3
citations
#72

Reasoning in Visual Navigation of End-to-end Trained Agents: A Dynamical Systems Approach

Steeven JANNY, Hervé Poirier, Leonid Antsfeld et al.

CVPR 2025
3
citations
#73

Toward Efficient Multi-Agent Exploration With Trajectory Entropy Maximization

Tianxu Li, Kun Zhu

ICLR 2025
multi-agent reinforcement learningdecentralized policiestrajectory entropy maximizationcontrastive trajectory representation+4
2
citations
#74

Task Planning for Object Rearrangement in Multi-Room Environments

Karan Mirakhor, Sourav Ghosh, Dipanjan Das et al.

AAAI 2024arXiv:2406.00451
object rearrangementmulti-room environmentshierarchical task planninglarge language models+4
2
citations
#75

Learning More Expressive General Policies for Classical Planning Domains

Simon Ståhlberg, Blai Bonet, Hector Geffner

AAAI 2025
2
citations
#76

Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration

Heyang Zhao, Xingrui Yu, David Bossens et al.

ICLR 2025
2
citations
#77

Risk-averse Total-reward MDPs with ERM and EVaR

Xihong Su, Marek Petrik, Julien Grand-Clément

AAAI 2025
2
citations
#78

Understanding Constraint Inference in Safety-Critical Inverse Reinforcement Learning

Bo Yue, Shufan Wang, Ashish Gaurav et al.

ICLR 2025
2
citations
#79

BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation

Oren Barkan, Yehonatan Elisha, Jonathan Weill et al.

AAAI 2025
2
citations
#80

Action abstractions for amortized sampling

Oussama Boussif, Léna Ezzine, Joseph Viviano et al.

ICLR 2025
2
citations
#81

Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration

Zijian Wang, Bin Wang, Haifeng Jing et al.

AAAI 2025
2
citations
#82

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Ziping Xu, Zifan Xu, Runxuan Jiang et al.

ICLR 2024
2
citations
#83

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

Runyu Lu, Peng Zhang, Ruochuan Shi et al.

NeurIPS 2025
2
citations
#84

Factorio Learning Environment

Jack Hopkins, Mart Bakler, Akbir Khan

NeurIPS 2025
2
citations
#85

Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents

Shuo Han, German Espinosa, Junda Huang et al.

ICML 2025
2
citations
#86

OptionZero: Planning with Learned Options

Po-Wei Huang, Pei-Chiun Peng, Hung Guei et al.

ICLR 2025
2
citations
#87

Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Yining Li, Peizhong Ju, Ness Shroff

ICLR 2024
1
citations
#88

Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning

Cheng Chen, Yunpeng Zhai, Yifan Zhao et al.

CVPR 2025
1
citations
#89

Safety Representations for Safer Policy Learning

Kaustubh Mani, Vincent Mai, Charlie Gauthier et al.

ICLR 2025
1
citations
#90

KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies

Shih-Min Yang, Martin Magnusson, Johannes Stork et al.

ICML 2025
1
citations
#91

Minimax Optimal Reinforcement Learning with Quasi-Optimism

Harin Lee, Min-hwan Oh

ICLR 2025
1
citations
#92

Action-Dependent Optimality-Preserving Reward Shaping

Grant Forbes, Jianxun Wang, Leonardo Villalobos-Arias et al.

ICML 2025
1
citations
#93

Long-Term Experiences From Working with Extended Reality in the Wild

Verena Biener, Florian Jack Winston, Dieter Schmalstieg et al.

ISMAR 2025
1
citations
#94

Behavioral Exploration: Learning to Explore via In-Context Adaptation

Andrew Wagenmaker, Zhiyuan Zhou, Sergey Levine

ICML 2025
1
citations
#95

Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory

Alexander Levine, Peter Stone, Amy Zhang

ICLR 2025
1
citations
#96

Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning

Ke Sun, Yingnan Zhao, Enze Shi et al.

NeurIPS 2025
1
citations
#97

Towards Empowerment Gain through Causal Structure Learning in Model-Based Reinforcement Learning

Hongye Cao, Fan Feng, Meng Fang et al.

ICLR 2025
causal structure learningmodel-based reinforcement learningintrinsic motivationempowerment maximization+3
1
citations
#98

Estimating cognitive biases with attention-aware inverse planning

Sounak Banerjee, Daphne Cornelisse, Deepak Gopinath et al.

NeurIPS 2025
1
citations
#99

MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning

Sizhe Tang, Jiayu Chen, Tian Lan

NeurIPS 2025arXiv:2511.06142
monte carlo tree searchmulti-agent planningcontextual linear banditlow-dimensional representation+4
1
citations
#100

Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing

XianJun, Davin Choo, Yuqi Pan, Tonghan Wang et al.

NeurIPS 2025
1
citations