"reinforcement learning" Papers

300 papers found • Page 2 of 6

EvoLM: In Search of Lost Language Model Training Dynamics

Zhenting Qi, Fan Nie, Alexandre Alahi et al.

NEURIPS 2025oralarXiv:2506.16029
5
citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NEURIPS 2025

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

Runlong Zhou, Maryam Fazel, Simon Shaolei Du

COLM 2025paperarXiv:2503.08942
13
citations

FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program

Yi-Xiang Hu, Feng Wu, Shaoang Li et al.

AAAI 2025paperarXiv:2412.19066

From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs

Xin Li, Xiaotao Zheng, Zhihong Xia

NEURIPS 2025

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf et al.

NEURIPS 2025arXiv:2507.02833
38
citations

General-Reasoner: Advancing LLM Reasoning Across All Domains

Xueguang Ma, Qian Liu, Dongfu Jiang et al.

NEURIPS 2025arXiv:2505.14652
86
citations

GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration

Li Mi, Manon Béchaz, Zeming Chen et al.

ICCV 2025arXiv:2508.00152

Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies

Vipul Sharma, Wesley Suttle, S Sivaranjani

NEURIPS 2025

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NEURIPS 2025arXiv:2506.16396
1
citations

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

Chunyu Wei, Wenji Hu, Xingjia Hao et al.

NEURIPS 2025arXiv:2511.00457
1
citations

GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Lang Qin, Ziming Wang, Runhao Jiang et al.

AAAI 2025paperarXiv:2404.15597
3
citations

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Tong Wei, Yijun Yang, Junliang Xing et al.

ICCV 2025arXiv:2503.08525
8
citations

HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving

Zhiwen Chen, Hanming Deng, Zhuoren Li et al.

NEURIPS 2025arXiv:2505.15793
3
citations

Heterogeneous Graph Transformers for Simultaneous Mobile Multi-Robot Task Allocation and Scheduling under Temporal Constraints

Batuhan Altundas, Shengkang Chen, Shivika Singh et al.

NEURIPS 2025oral

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Nick Hansen, Jyothir S V, Vlad Sobal et al.

ICLR 2025arXiv:2405.18418
23
citations

How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning

Max Weltevrede, Moritz Zanger, Matthijs Spaan et al.

NEURIPS 2025arXiv:2505.16581

Hybrid Latent Reasoning via Reinforcement Learning

Zhenrui Yue, Bowen Jin, Huimin Zeng et al.

NEURIPS 2025arXiv:2505.18454
8
citations

HYPRL: Reinforcement Learning of Control Policies for Hyperproperties

Tzu-Han Hsu, Arshia Rafieioskouei, Borzoo Bonakdarpour

NEURIPS 2025arXiv:2504.04675
2
citations

Improving Monte Carlo Tree Search for Symbolic Regression

Zhengyao Huang, Daniel Huang, Tiannan Xiao et al.

NEURIPS 2025arXiv:2509.15929

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Yinlam Chow, Guy Tennenholtz, Izzeddin Gur et al.

ICLR 2025arXiv:2412.15287
49
citations

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Cong Lu, Shengran Hu, Jeff Clune

ICLR 2025arXiv:2405.15143
27
citations

Intelligent OPC Engineer Assistant for Semiconductor Manufacturing

Guojin Chen, Haoyu Yang, Bei Yu et al.

AAAI 2025paperarXiv:2408.12775
2
citations

Iterative Foundation Model Fine-Tuning on Multiple Rewards

Pouya M. Ghari, simone sciabola, Ye Wang

NEURIPS 2025arXiv:2511.00220

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

Kaihang Pan, Yang Wu, Wendong Bu et al.

NEURIPS 2025arXiv:2506.01480
7
citations

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025arXiv:2410.23208
21
citations

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025arXiv:2405.18540
47
citations

Learning mirror maps in policy mirror descent

Carlo Alfano, Sebastian Towers, Silvia Sapora et al.

ICLR 2025arXiv:2402.05187
2
citations

Learning to Clean: Reinforcement Learning for Noisy Label Correction

Marzi Heidari, Hanping Zhang, Yuhong Guo

NEURIPS 2025arXiv:2511.19808

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata

COLM 2025paper
19
citations

Learning to Reuse Policies in State Evolvable Environments

Ziqian Zhang, Bohan Yang, Lihe Li et al.

ICML 2025oral

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Max Wilcoxson, Qiyang Li, Kevin Frans et al.

ICML 2025arXiv:2410.18076
7
citations

LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning

Zhuorui Ye, Stephanie Milani, Geoff Gordon et al.

ICLR 2025arXiv:2407.15786
5
citations

LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models

Qianyue Hao, Yiwen Song, Qingmin Liao et al.

NEURIPS 2025spotlightarXiv:2505.15293
3
citations

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Haoxian Chen, Hanyang Zhao, Henry Lam et al.

ICLR 2025arXiv:2405.14953
15
citations

MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.

COLM 2025paperarXiv:2412.01928
37
citations

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Gunshi Gupta, Karmesh Yadav, Zsolt Kira et al.

NEURIPS 2025spotlightarXiv:2510.19732

Meta-learning how to Share Credit among Macro-Actions

Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu

NEURIPS 2025oralarXiv:2506.13690

MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility

Wayne Wu, Honglin He, Jack He et al.

ICLR 2025arXiv:2407.08725
11
citations

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NEURIPS 2025arXiv:2506.22434

Modelling the control of offline processing with reinforcement learning

Eleanor Spens, Neil Burgess, Tim Behrens

NEURIPS 2025

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Chenglong Wang, Yang Gan, Hang Zhou et al.

NEURIPS 2025arXiv:2510.21473
1
citations

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025arXiv:2505.19591
35
citations

MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks

WANTONG XIE, Yi-Xiang Hu, Jieyang Xu et al.

NEURIPS 2025

Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning

Chenjie Hao, Weyl Lu, Yifan Xu et al.

CVPR 2025arXiv:2504.07095
5
citations

Neuroplastic Expansion in Deep Reinforcement Learning

Jiashun Liu, Johan S Obando Ceron, Aaron Courville et al.

ICLR 2025arXiv:2410.07994
13
citations

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

Chenglu Sun, Shuo Shen, Wenzhi Tao et al.

AAAI 2025paperarXiv:2501.01085
5
citations

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

Longtian Qiu, Shan Ning, Jiaxuan Sun et al.

NEURIPS 2025arXiv:2510.21122
1
citations

No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes

Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.

NEURIPS 2025oralarXiv:2510.20725

Normalizing Flows are Capable Models for Continuous Control

Raj Ghugare, Benjamin Eysenbach

NEURIPS 2025