🧬Reinforcement Learning

Multi-Agent RL

RL with multiple agents

100 papers4,496 total citations
Compare with other topics
Mar '24 Feb '26806 papers
Also includes: multi-agent reinforcement learning, marl, multi-agent systems, cooperative rl

Top Papers

#1

A Generalist Agent

Jackie Kay, Sergio Gómez Colmenarejo, Mahyar Bordbar et al.

ICLR 2024
978
citations
#2

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

Weize Chen, Yusheng Su, Jingwei Zuo et al.

ICLR 2024
476
citations
#3

Mixture-of-Agents Enhances Large Language Model Capabilities

Junlin Wang, Jue Wang, Ben Athiwaratkun et al.

ICLR 2025
274
citations
#4

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NeurIPS 2025arXiv:2503.13657
multi-agent llm systemsfailure pattern analysissystem failure taxonomyllm-as-a-judge+3
178
citations
#5

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.

ICLR 2025
127
citations
#6

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Zehan Qi, Xiao Liu, Iat Long Iong et al.

ICLR 2025arXiv:2411.02337
llm web agentsonline curriculum reinforcement learningself-evolving curriculumoutcome-supervised reward model+3
110
citations
#7

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Weiran Yao, Shelby Heinecke, Juan Carlos Niebles et al.

ICLR 2024
104
citations
#8

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei et al.

ICLR 2025
103
citations
#9

Agent S: An Open Agentic Framework that Uses Computers Like a Human

Saaket Agashe, Jiuzhou Han, Shuyu Gan et al.

ICLR 2025arXiv:2410.08164
100
citations
#10

Reliable Conflictive Multi-View Learning

Cai Xu, Jiajun Si, Ziyu Guan et al.

AAAI 2024arXiv:2402.16897
multi-view learningconflictive instancesevidential learningopinion aggregation+2
88
citations
#11

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Mengkang Hu, Yuhang Zhou, Wendong Fan et al.

NeurIPS 2025
78
citations
#12

WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

Yao Zhang, Zijian Ma, Yunpu Ma et al.

AAAI 2025
74
citations
#13

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Davide Paglieri, Bartłomiej Cupiał, Samuel Coward et al.

ICLR 2025
70
citations
#14

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Seohong Park, Oleh Rybkin, Sergey Levine

ICLR 2024
68
citations
#15

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Xiao Liu, Tianjie Zhang, Yu Gu et al.

ICLR 2025
67
citations
#16

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

Zhen Xiang, Linzhi Zheng, Yanjie Li et al.

ICML 2025
66
citations
#17

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

Marwa Abdulhai, Isadora White, Charlie Snell et al.

ICML 2025
63
citations
#18

RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts

Hjalmar Wijk, Tao Lin, Joel Becker et al.

ICML 2025
56
citations
#19

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Patara Trirat, Wonyong Jeong, Sung Ju Hwang

ICML 2025
55
citations
#20

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Yiheng Xu, Dunjie Lu, Zhennan Shen et al.

ICLR 2025
50
citations
#21

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

Clément Bonnet, Daniel Luo, Donal Byrne et al.

ICLR 2024
47
citations
#22

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

Matthew Chang, Gunjan Chhablani, Alexander Clegg et al.

ICLR 2025arXiv:2411.00081
human-robot collaborationembodied multi-agent taskslarge language modelstask planning+4
44
citations
#23

Self-Evolving Multi-Agent Collaboration Networks for Software Development

Yue Hu, Yuzhu Cai, Yaxin Du et al.

ICLR 2025
40
citations
#24

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

Harshit Sikchi, Qinqing Zheng, Amy Zhang et al.

ICLR 2024
39
citations
#25

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

Zhaorun Chen, Mintong Kang, Bo Li

ICML 2025
37
citations
#26

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.

NeurIPS 2025arXiv:2503.09501
meta-thinkingmulti-agent reinforcement learninglarge language modelsreasoning processes+4
35
citations
#27

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.

ICLR 2025
33
citations
#28

Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning

Chenyu Zhang, Han Wang, Aritra Mitra et al.

ICLR 2024
31
citations
#29

System 1.x: Learning to Balance Fast and Slow Planning with Language Models

Swarnadeep Saha, Archiki Prasad, Justin Chen et al.

ICLR 2025
31
citations
#30

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NeurIPS 2025arXiv:2506.01300
video understandingreward-driven agentsmulti-agent frameworkvision-language models+4
29
citations
#31

Cooper: Coordinating Specialized Agents towards a Complex Dialogue Goal

Yi Cheng, Wenge Liu, Jian Wang et al.

AAAI 2024arXiv:2312.11792
complex dialogue goalsmulti-agent coordinationpersuasive dialogue systemsemotional support dialogue+4
27
citations
#32

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Haiyang SHEN, Yue Li, Desong Meng et al.

ICLR 2025
26
citations
#33

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

Jusheng Zhang, Zimeng Huang, Yijia Fan et al.

ICML 2025
26
citations
#34

Entity-Centric Reinforcement Learning for Object Manipulation from Pixels

Dan Haramati, Tal Daniel, Aviv Tamar

ICLR 2024
25
citations
#35

ResearchTown: Simulator of Human Research Community

Haofei Yu, Zhaochen Hong, Zirui Cheng et al.

ICML 2025
25
citations
#36

Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving

Zhenghao Peng, Wenjie Luo, Yiren Lu et al.

ECCV 2024arXiv:2409.18343
autonomous drivingagent behavior modelingreinforcement learning fine-tuningdistribution shift+4
23
citations
#37

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Guibin Zhang, Muxin Fu, Kun Wang et al.

NeurIPS 2025
22
citations
#38

Flow: Modularized Agentic Workflow Automation

Boye Niu, Yiliao Song, Kai Lian et al.

ICLR 2025
21
citations
#39

Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Desai Xie, Jiahao Li, Hao Tan et al.

CVPR 2024
21
citations
#40

Agent-Oriented Planning in Multi-Agent Systems

Ao LI, Yuexiang Xie, Songze Li et al.

ICLR 2025arXiv:2410.02189
multi-agent systemsagent-oriented planningtask decompositiontask allocation+3
20
citations
#41

GOAL: A Generalist Combinatorial Optimization Agent Learner

Darko Drakulić, Sofia Michel, Jean-Marc Andreoli

ICLR 2025
20
citations
#42

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Div Garg, Diego Caples, Andis Draguns et al.

NeurIPS 2025arXiv:2504.11543
autonomous agentsweb navigationdeterministic simulationsmulti-turn agent evaluations+4
19
citations
#43

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents

Arrasy Rahman, Jiaxun Cui, Peter Stone

AAAI 2024arXiv:2308.09595
ad hoc teamworkminimum coverage setrobust cooperationteammate policy diversity+4
19
citations
#44

Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding

Zhe Chen, Daniel Harabor, Jiaoyang Li et al.

AAAI 2024arXiv:2308.11234
multi-agent path findingtraffic flow optimizationcollision-free path planningcongestion avoidance+4
18
citations
#45

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NeurIPS 2025arXiv:2210.14051
18
citations
#46

SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning

Wanjia Zhao, Mert Yuksekgonul, Shirley Wu et al.

NeurIPS 2025arXiv:2502.04780
multi-agent systemslarge language modelsreasoning trajectoriesself-improving systems+3
18
citations
#47

Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

Yiming Li, Zhiheng Li, Nuo Chen et al.

CVPR 2024
17
citations
#48

Stitching Sub-trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL

Sungyoon Kim, Yunseon Choi, Daiki Matsunaga et al.

AAAI 2024arXiv:2402.07226
offline reinforcement learninggoal-conditioned rlconditional diffusion modelssub-trajectory stitching+4
17
citations
#49

RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning

Jingdi Chen, Tian Lan, Carlee Joe-Wong

AAAI 2024arXiv:2308.03358
multi-agent reinforcement learningdiscrete communicationreturn gap minimizationonline clustering problem+4
17
citations
#50

Reinforce LLM Reasoning through Multi-Agent Reflection

Yurun Yuan, Tengyang Xie

ICML 2025
16
citations
#51

ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems

Xiangyuan Xue, Zeyu Lu, Di Huang et al.

CVPR 2025arXiv:2409.01392
15
citations
#52

Horizon Reduction Makes RL Scalable

Seohong Park, Kevin Frans, Deepinder Mann et al.

NeurIPS 2025
15
citations
#53

SAFE: Multitask Failure Detection for Vision-Language-Action Models

Qiao Gu, Yuanliang Ju, Shengxiang Sun et al.

NeurIPS 2025
15
citations
#54

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

Andy Zhou, Kevin Wu, Francesco Pinto et al.

NeurIPS 2025arXiv:2503.15754
autonomous red teaminglarge language modelsmulti-agent architectureattack vector discovery+3
15
citations
#55

FoX: Formation-Aware Exploration in Multi-Agent Reinforcement Learning

Yonghyeon Jo, Sunwoo Lee, Junghyuk Yum et al.

AAAI 2024arXiv:2308.11272
multi-agent reinforcement learningpartial observabilityexploration space scalabilityformation-based equivalence+4
14
citations
#56

Simulating Human-like Daily Activities with Desire-driven Autonomy

Yiding Wang, Yuxuan Chen, Fangwei Zhong et al.

ICLR 2025arXiv:2412.06435
desire-driven autonomylarge language model agentsdynamic value systemhuman-like desires+4
14
citations
#57

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Yanqi Dai, Huanran Hu, Lei Wang et al.

ICLR 2025
14
citations
#58

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.

ICLR 2025
14
citations
#59

AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Edan Toledo, Karen Hambardzumyan, Martin Josifoski et al.

NeurIPS 2025arXiv:2507.02554
ai research agentsautomated machine learningsearch policiesmcts algorithms+4
14
citations
#60

TANGO: Training-free Embodied AI Agents for Open-world Tasks

Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.

CVPR 2025arXiv:2412.10402
13
citations
#61

ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments

Taewoong Kim, Cheolhong Min, Byeonghwi Kim et al.

ECCV 2024
13
citations
#62

ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration

Andrew Estornell, Jean-Francois Ton, Yuanshun Yao et al.

ICLR 2025arXiv:2411.00053
13
citations
#63

Scaling Autonomous Agents via Automatic Reward Modeling And Planning

Zhenfang Chen, Delin Chen, Rui Sun et al.

ICLR 2025
13
citations
#64

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Yongyuan Liang, Yanchao Sun, Ruijie Zheng et al.

ICLR 2024
12
citations
#65

Learning Efficient and Robust Multi-Agent Communication via Graph Information Bottleneck

Shifei Ding, Wei Du, Ling Ding et al.

AAAI 2024
12
citations
#66

SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration

Jipeng Cen, Jiaxin Liu, Zhixu Li et al.

AAAI 2025
12
citations
#67

ConcaveQ: Non-monotonic Value Function Factorization via Concave Representations in Deep Multi-Agent Reinforcement Learning

Huiqun Li, Hanhan Zhou, Yifei Zou et al.

AAAI 2024arXiv:2312.15555
value function factorizationmulti-agent reinforcement learningnon-monotonic mixing functionsconcave representations+3
12
citations
#68

Skill Expansion and Composition in Parameter Space

Tenglong Liu, Jianxiong Li, Yinan Zheng et al.

ICLR 2025
11
citations
#69

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Yiran Qin, Li Kang, Xiufeng Song et al.

ICCV 2025
11
citations
#70

UNEX-RL: Reinforcing Long-Term Rewards in Multi-Stage Recommender Systems with UNidirectional EXecution

Gengrui Zhang, Xiaoshuang Chen, Yao WANG et al.

AAAI 2024arXiv:2401.06470
reinforcement learningmulti-stage recommender systemsmulti-agent reinforcement learninglong-term rewards+4
11
citations
#71

Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning

Ge Li, Hongyi Zhou, Dominik Roth et al.

ICLR 2024
11
citations
#72

Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation

Eliot Xing, Vernon Luk, Jean Oh

ICLR 2025
11
citations
#73

Learning Decentralized Partially Observable Mean Field Control for Artificial Collective Behavior

Kai Cui, Sascha Hauck, Christian Fabian et al.

ICLR 2024
10
citations
#74

General Scene Adaptation for Vision-and-Language Navigation

Haodong Hong, Yanyuan Qiao, Sen Wang et al.

ICLR 2025arXiv:2501.17403
vision-and-language navigationscene adaptationinstruction orchestrationout-of-distribution generalization+4
10
citations
#75

Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing

Jinmin He, Kai Li, Yifan Zang et al.

AAAI 2024arXiv:2312.14472
multi-task reinforcement learningdynamic depth routingparameter sharingrouting network+3
10
citations
#76

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

Thomy Phan, Taoan Huang, Bistra Dilkina et al.

AAAI 2024arXiv:2312.16767
multi-agent path findinglarge neighborhood searchbandit-based optimizationonline learning+4
10
citations
#77

Pareto Set Learning for Multi-Objective Reinforcement Learning

Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.

AAAI 2025
10
citations
#78

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.

ICLR 2025
10
citations
#79

Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households

Zhihao Cao, ZiDong Wang, Siwen Xie et al.

CVPR 2024
10
citations
#80

Federated Contextual Cascading Bandits with Asynchronous Communication and Heterogeneous Users

Hantao Yang, Xutong Liu, Zhiyong Wang et al.

AAAI 2024arXiv:2402.16312
federated learningcontextual banditscascading banditsasynchronous communication+4
9
citations
#81

Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL

Xiangyu Liu, Souradip Chakraborty, Yanchao Sun et al.

ICLR 2024
9
citations
#82

TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception

Zhiying Song, Lei Yang, Fuxi Wen et al.

CVPR 2025
9
citations
#83

ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning

Hongshu Guo, Zeyuan Ma, Jiacheng Chen et al.

AAAI 2025
9
citations
#84

GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling

Jialong Zhou, Lichao Wang, Xiao Yang

NeurIPS 2025
9
citations
#85

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals

Grace Liu, Michael Tang, Benjamin Eysenbach

ICLR 2025arXiv:2408.05804
contrastive reinforcement learningskill emergencedirected explorationreward-free learning+2
9
citations
#86

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Fan LIU, Zherui Yang, Cancheng Liu et al.

NeurIPS 2025
9
citations
#87

Flow-Based Policy for Online Reinforcement Learning

Lei Lv, Yunfei Li, Yu Luo et al.

NeurIPS 2025
8
citations
#88

Improved Anonymous Multi Agent Path Finding Algorithm

Zain Alabedeen Ali, Konstantin Yakovlev

AAAI 2024arXiv:2312.10572
multi-agent path findinggraph search algorithmsmakespan optimizationcollision-free path planning+4
8
citations
#89

Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

Yifei Zhou, Ayush Sekhari, Yuda Song et al.

ICLR 2024
8
citations
#90

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Haozhen Zhang, Tao Feng, Jiaxuan You

NeurIPS 2025arXiv:2506.09033
llm routingreinforcement learningsequential decision processmodel aggregation+3
8
citations
#91

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Zhiyu Mei, Wei Fu, Jiaxuan Gao et al.

ICLR 2024
8
citations
#92

Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing

Haobin Jiang, Ziluo Ding, Zongqing Lu

AAAI 2024arXiv:2402.02097
multi-agent reinforcement learningcoordinated explorationnovelty sharingdecentralized cooperation+4
8
citations
#93

Decoding Global Preferences: Temporal and Cooperative Dependency Modeling in Multi-Agent Preference-Based Reinforcement Learning

Tianchen Zhu, Yue Qiu, Haoyi Zhou et al.

AAAI 2024
8
citations
#94

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents

Junpeng Yue, Xinrun Xu, Börje F. Karlsson et al.

ICLR 2025arXiv:2410.03450
multimodal retrievalembodied agentspreference learningtrajectory abstraction+3
8
citations
#95

Robust Communicative Multi-Agent Reinforcement Learning with Active Defense

Lebin Yu, Yunbo Qiu, Quanming Yao et al.

AAAI 2024arXiv:2312.11545
multi-agent reinforcement learningadversarial attacksactive defense strategyagent communication+3
8
citations
#96

Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization

Zongkai Liu, Qian Lin, Chao Yu et al.

AAAI 2025
8
citations
#97

Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding

Hongzhi Zang, Yulun Zhang, He Jiang et al.

AAAI 2025
8
citations
#98

Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects

Tai Hoang, Huy Le, Philipp Becker et al.

ICLR 2025arXiv:2502.07005
heterogeneous graph representationse(3) equivariant networksdeformable object manipulationrigid object insertion+3
8
citations
#99

(Almost Full) EFX for Three (and More) Types of Agents

Pratik Ghosal, Vishwa Prakash HV, Prajakta Nimbhorkar et al.

AAAI 2025
8
citations
#100

CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation

Jie Liu, Pan Zhou, Yingjun Du et al.

ICLR 2025arXiv:2411.04679
8
citations