🧬Reinforcement Learning

Multi-Agent RL

RL with multiple agents

100 papers4,943 total citations
Compare with other topics
Feb '24 Jan '26848 papers
Also includes: multi-agent reinforcement learning, marl, multi-agent systems, cooperative rl

Top Papers

#1

A Generalist Agent

Jackie Kay, Sergio Gómez Colmenarejo, Mahyar Bordbar et al.

ICLR 2024
978
citations
#2

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

Weize Chen, Yusheng Su, Jingwei Zuo et al.

ICLR 2024
476
citations
#3

Mixture-of-Agents Enhances Large Language Model Capabilities

Junlin Wang, Jue Wang, Ben Athiwaratkun et al.

ICLR 2025
274
citations
#4

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NeurIPS 2025arXiv:2503.13657
multi-agent llm systemsfailure pattern analysissystem failure taxonomyllm-as-a-judge+3
188
citations
#5

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.

ICLR 2025
127
citations
#6

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Zehan Qi, Xiao Liu, Iat Long Iong et al.

ICLR 2025
110
citations
#7

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Weiran Yao, Shelby Heinecke, Juan Carlos Niebles et al.

ICLR 2024
104
citations
#8

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei et al.

ICLR 2025
103
citations
#9

Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control

Longtao Zheng, Rundong Wang, Xinrun Wang et al.

ICLR 2024
103
citations
#10

Agent S: An Open Agentic Framework that Uses Computers Like a Human

Saaket Agashe, Jiuzhou Han, Shuyu Gan et al.

ICLR 2025
100
citations
#11

Reliable Conflictive Multi-View Learning

Cai Xu, Jiajun Si, Ziyu Guan et al.

AAAI 2024arXiv:2402.16897
multi-view learningconflictive instancesevidential learningopinion aggregation+2
88
citations
#12

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Mengkang Hu, Yuhang Zhou, Wendong Fan et al.

NeurIPS 2025
78
citations
#13

OGBench: Benchmarking Offline Goal-Conditioned RL

Seohong Park, Kevin Frans, Benjamin Eysenbach et al.

ICLR 2025arXiv:2410.20092
offline reinforcement learninggoal-conditioned rlbenchmark evaluationoffline gcrl algorithms+3
74
citations
#14

WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Yao Zhang, Zijian Ma, Yunpu Ma et al.

AAAI 2025
74
citations
#15

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.

ICLR 2025
70
citations
#16

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Seohong Park, Oleh Rybkin, Sergey Levine

ICLR 2024
68
citations
#17

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Xiao Liu, Tianjie Zhang, Yu Gu et al.

ICLR 2025
67
citations
#18

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Ke Yang, Yao Liu, Sapana Chaudhary et al.

ICLR 2025arXiv:2410.13825
web agent groundingobservation space alignmentaction space alignmentllm-based agents+4
66
citations
#19

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

Zhen Xiang, Linzhi Zheng, Yanjie Li et al.

ICML 2025
66
citations
#20

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

Marwa Abdulhai, Isadora White, Charlie Snell et al.

ICML 2025
63
citations
#21

DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?

Liqiang Jing, Zhehui Huang, Xiaoyang Wang et al.

ICLR 2025arXiv:2409.07703
data science agentslarge language modelslarge vision-language modelsdata analysis tasks+4
62
citations
#22

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Hyungjoo Chae, Namyoung Kim, Kai Ong et al.

ICLR 2025arXiv:2410.13232
web navigation agentsworld modelslarge language modelsautonomous agents+4
59
citations
#23

RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts

Hjalmar Wijk, Tao Lin, Joel Becker et al.

ICML 2025
56
citations
#24

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Patara Trirat, Wonyong Jeong, Sung Ju Hwang

ICML 2025
55
citations
#25

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Yiheng Xu, Dunjie Lu, Zhennan Shen et al.

ICLR 2025
50
citations
#26

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

Clément Bonnet, Daniel Luo, Donal Byrne et al.

ICLR 2024
47
citations
#27

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

Matthew Chang, Gunjan Chhablani, Alexander Clegg et al.

ICLR 2025
44
citations
#28

Self-Evolving Multi-Agent Collaboration Networks for Software Development

Yue Hu, Yuzhu Cai, Yaxin Du et al.

ICLR 2025
40
citations
#29

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

Harshit Sikchi, Qinqing Zheng, Amy Zhang et al.

ICLR 2024
39
citations
#30

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

Zhaorun Chen, Mintong Kang, Bo Li

ICML 2025
37
citations
#31

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NeurIPS 2025arXiv:2503.09501
meta-thinkingmulti-agent reinforcement learninglarge language modelsreasoning processes+4
36
citations
#32

V-IRL: Grounding Virtual Intelligence in Real Life

Jihan YANG, Runyu Ding, Ellis L Brown et al.

ECCV 2024arXiv:2402.03310
embodied ai agentsvirtual environmentsreal-world interactionperception and decision-making+4
35
citations
#33

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.

ICLR 2025
33
citations
#34

Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning

Chenyu Zhang, Han Wang, Aritra Mitra et al.

ICLR 2024
31
citations
#35

System 1.x: Learning to Balance Fast and Slow Planning with Language Models

Swarnadeep Saha, Archiki Prasad, Justin Chen et al.

ICLR 2025
31
citations
#36

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

Qi Wu, Yubo Zhao, Yifan Wang et al.

ICLR 2025arXiv:2405.17013
human motion generationconversational frameworkmotion editingmotion understanding+4
30
citations
#37

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NeurIPS 2025
29
citations
#38

Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

Yiqun Chen, Lingyong Yan, Weiwei Sun et al.

NeurIPS 2025arXiv:2501.15228
retrieval-augmented generationmulti-agent reinforcement learningquery rewritingdocument retrieval+3
27
citations
#39

Cooper: Coordinating Specialized Agents towards a Complex Dialogue Goal

Yi Cheng, Wenge Liu, Jian Wang et al.

AAAI 2024arXiv:2312.11792
complex dialogue goalsmulti-agent coordinationpersuasive dialogue systemsemotional support dialogue+4
27
citations
#40

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

Jusheng Zhang, Zimeng Huang, Yijia Fan et al.

ICML 2025
26
citations
#41

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Haiyang SHEN, Yue Li, Desong Meng et al.

ICLR 2025
26
citations
#42

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NeurIPS 2025arXiv:2505.19591
multi-agent collaborationlarge language modelsreinforcement learningdynamic orchestration+2
25
citations
#43

Entity-Centric Reinforcement Learning for Object Manipulation from Pixels

Dan Haramati, Tal Daniel, Aviv Tamar

ICLR 2024
25
citations
#44

ResearchTown: Simulator of Human Research Community

Haofei Yu, Zhaochen Hong, Zirui Cheng et al.

ICML 2025
25
citations
#45

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

Zhenghao Peng, Wenjie Luo, Yiren Lu et al.

ECCV 2024
23
citations
#46

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Guibin Zhang, Muxin Fu, Kun Wang et al.

NeurIPS 2025
22
citations
#47

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Desai Xie, Jiahao Li, Hao Tan et al.

CVPR 2024
21
citations
#48

Flow: Modularized Agentic Workflow Automation

Boye Niu, Yiliao Song, Kai Lian et al.

ICLR 2025
21
citations
#49

GOAL: A Generalist Combinatorial Optimization Agent Learner

Darko Drakulić, Sofia Michel, Jean-Marc Andreoli

ICLR 2025
20
citations
#50

Agent-Oriented Planning in Multi-Agent Systems

Ao LI, Yuexiang Xie, Songze Li et al.

ICLR 2025
20
citations
#51

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025arXiv:2410.23208
reinforcement learningphysics-based controlprocedural generationhardware-accelerated simulation+4
20
citations
#52

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents

Arrasy Rahman, Jiaxun Cui, Peter Stone

AAAI 2024arXiv:2308.09595
ad hoc teamworkminimum coverage setrobust cooperationteammate policy diversity+4
19
citations
#53

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Div Garg, Diego Caples, Andis Draguns et al.

NeurIPS 2025arXiv:2504.11543
autonomous agentsweb navigationdeterministic simulationsmulti-turn agent evaluations+4
19
citations
#54

Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding

Zhe Chen, Daniel Harabor, Jiaoyang Li et al.

AAAI 2024arXiv:2308.11234
multi-agent path findingtraffic flow optimizationcollision-free path planningcongestion avoidance+4
18
citations
#55

SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning

Wanjia Zhao, Mert Yuksekgonul, Shirley Wu et al.

NeurIPS 2025
18
citations
#56

Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.

NeurIPS 2025arXiv:2505.16394
reinforcement learningautonomous drivingworld modelsmodel-based reinforcement learning+4
18
citations
#57

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NeurIPS 2025
18
citations
#58

Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

Yiming Li, Zhiheng Li, Nuo Chen et al.

CVPR 2024
17
citations
#59

Stitching Sub-trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL

Sungyoon Kim, Yunseon Choi, Daiki Matsunaga et al.

AAAI 2024arXiv:2402.07226
offline reinforcement learninggoal-conditioned rlconditional diffusion modelssub-trajectory stitching+4
17
citations
#60

RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning

Jingdi Chen, Tian Lan, Carlee Joe-Wong

AAAI 2024arXiv:2308.03358
multi-agent reinforcement learningdiscrete communicationreturn gap minimizationonline clustering problem+4
17
citations
#61

Reinforce LLM Reasoning through Multi-Agent Reflection

Yurun Yuan, Tengyang Xie

ICML 2025
16
citations
#62

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

Andy Zhou, Kevin Wu, Francesco Pinto et al.

NeurIPS 2025
15
citations
#63

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Xiangyuan Xue, Zeyu Lu, Di Huang et al.

CVPR 2025
15
citations
#64

Horizon Reduction Makes RL Scalable

Seohong Park, Kevin Frans, Deepinder Mann et al.

NeurIPS 2025
15
citations
#65

SAFE: Multitask Failure Detection for Vision-Language-Action Models

Qiao Gu, Yuanliang Ju, Shengxiang Sun et al.

NeurIPS 2025
15
citations
#66

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Edan Toledo, Karen Hambardzumyan, Martin Josifoski et al.

NeurIPS 2025arXiv:2507.02554
ai research agentsautomated machine learningsearch policiesmcts algorithms+4
15
citations
#67

Simulating Human-like Daily Activities with Desire-driven Autonomy

Yiding Wang, Yuxuan Chen, Fangwei Zhong et al.

ICLR 2025
14
citations
#68

FoX: Formation-Aware Exploration in Multi-Agent Reinforcement Learning

Yonghyeon Jo, Sunwoo Lee, Junghyuk Yum et al.

AAAI 2024arXiv:2308.11272
multi-agent reinforcement learningpartial observabilityexploration space scalabilityformation-based equivalence+4
14
citations
#69

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Yanqi Dai, Huanran Hu, Lei Wang et al.

ICLR 2025
14
citations
#70

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.

ICLR 2025
14
citations
#71

ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments

Taewoong Kim, Cheolhong Min, Byeonghwi Kim et al.

ECCV 2024
13
citations
#72

TANGO: Training-free Embodied AI Agents for Open-world Tasks

Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.

CVPR 2025
13
citations
#73

ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration

Andrew Estornell, Jean-Francois Ton, Yuanshun Yao et al.

ICLR 2025
13
citations
#74

Scaling Autonomous Agents via Automatic Reward Modeling And Planning

Zhenfang Chen, Delin Chen, Rui Sun et al.

ICLR 2025
13
citations
#75

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Yongyuan Liang, Yanchao Sun, Ruijie Zheng et al.

ICLR 2024
12
citations
#76

ConcaveQ: Non-monotonic Value Function Factorization via Concave Representations in Deep Multi-Agent Reinforcement Learning

Huiqun Li, Hanhan Zhou, Yifei Zou et al.

AAAI 2024arXiv:2312.15555
value function factorizationmulti-agent reinforcement learningnon-monotonic mixing functionsconcave representations+3
12
citations
#77

Learning Efficient and Robust Multi-Agent Communication via Graph Information Bottleneck

Shifei Ding, Wei Du, Ling Ding et al.

AAAI 2024
12
citations
#78

SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration

Jipeng Cen, Jiaxin Liu, Zhixu Li et al.

AAAI 2025
12
citations
#79

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Yiran Qin, Li Kang, Xiufeng Song et al.

ICCV 2025
11
citations
#80

Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning

Ge Li, Hongyi Zhou, Dominik Roth et al.

ICLR 2024
11
citations
#81

UNEX-RL: Reinforcing Long-Term Rewards in Multi-Stage Recommender Systems with UNidirectional EXecution

Gengrui Zhang, Xiaoshuang Chen, Yao WANG et al.

AAAI 2024arXiv:2401.06470
reinforcement learningmulti-stage recommender systemsmulti-agent reinforcement learninglong-term rewards+4
11
citations
#82

Skill Expansion and Composition in Parameter Space

Tenglong Liu, Jianxiong Li, Yinan Zheng et al.

ICLR 2025
11
citations
#83

Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation

Eliot Xing, Vernon Luk, Jean Oh

ICLR 2025
11
citations
#84

Learning Decentralized Partially Observable Mean Field Control for Artificial Collective Behavior

Kai Cui, Sascha Hauck, Christian Fabian et al.

ICLR 2024
10
citations
#85

General Scene Adaptation for Vision-and-Language Navigation

Haodong Hong, Yanyuan Qiao, Sen Wang et al.

ICLR 2025
10
citations
#86

DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

Hao Li, Xiaogeng Liu, CHIU Chun et al.

NeurIPS 2025arXiv:2506.12104
prompt injection attacksagentic systems securitydynamic rule enforcementmemory stream isolation+4
10
citations
#87

Pareto Set Learning for Multi-Objective Reinforcement Learning

Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.

AAAI 2025
10
citations
#88

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

Thomy Phan, Taoan Huang, Bistra Dilkina et al.

AAAI 2024arXiv:2312.16767
multi-agent path findinglarge neighborhood searchbandit-based optimizationonline learning+4
10
citations
#89

Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing

Jinmin He, Kai Li, Yifan Zang et al.

AAAI 2024arXiv:2312.14472
multi-task reinforcement learningdynamic depth routingparameter sharingrouting network+3
10
citations
#90

Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households

Zhihao Cao, ZiDong Wang, Siwen Xie et al.

CVPR 2024
10
citations
#91

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals

Grace Liu, Michael Tang, Benjamin Eysenbach

ICLR 2025arXiv:2408.05804
contrastive reinforcement learningskill emergencedirected explorationreward-free learning+2
10
citations
#92

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.

ICLR 2025
10
citations
#93

Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL

Xiangyu Liu, Souradip Chakraborty, Yanchao Sun et al.

ICLR 2024
9
citations
#94

BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

Andy Zhang, Joey Ji, Celeste Menders et al.

NeurIPS 2025arXiv:2505.15216
cybersecurity ai agentsvulnerability detectionbug bounty programsexploit generation+4
9
citations
#95

ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning

Hongshu Guo, Zeyuan Ma, Jiacheng Chen et al.

AAAI 2025
9
citations
#96

TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception

Zhiying Song, Lei Yang, Fuxi Wen et al.

CVPR 2025
9
citations
#97

Federated Contextual Cascading Bandits with Asynchronous Communication and Heterogeneous Users

Hantao Yang, Xutong Liu, Zhiyong Wang et al.

AAAI 2024arXiv:2402.16312
federated learningcontextual banditscascading banditsasynchronous communication+4
9
citations
#98

GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling

Jialong Zhou, Lichao Wang, Xiao Yang

NeurIPS 2025
9
citations
#99

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Fan LIU, Zherui Yang, Cancheng Liu et al.

NeurIPS 2025
9
citations
#100

Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

Yifei Zhou, Ayush Sekhari, Yuda Song et al.

ICLR 2024
8
citations