"reinforcement learning" Papers

300 papers found • Page 2 of 6

Filters:reinforcement learning Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

EvoLM: In Search of Lost Language Model Training Dynamics

Zhenting Qi, Fan Nie, Alexandre Alahi et al.

NEURIPS 2025oralarXiv:2506.16029

citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NEURIPS 2025

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

Runlong Zhou, Maryam Fazel, Simon Shaolei Du

COLM 2025paperarXiv:2503.08942

citations

FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program

Yi-Xiang Hu, Feng Wu, Shaoang Li et al.

AAAI 2025paperarXiv:2412.19066

From Kolmogorov to Cauchy: Shallow XNet Surpasses KANs

Xin Li, Xiaotao Zheng, Zhihong Xia

NEURIPS 2025

Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf et al.

NEURIPS 2025arXiv:2507.02833

citations

General-Reasoner: Advancing LLM Reasoning Across All Domains

Xueguang Ma, Qian Liu, Dongfu Jiang et al.

NEURIPS 2025arXiv:2505.14652

citations

GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration

Li Mi, Manon Béchaz, Zeming Chen et al.

ICCV 2025arXiv:2508.00152

Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies

Vipul Sharma, Wesley Suttle, S Sivaranjani

NEURIPS 2025

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NEURIPS 2025arXiv:2506.16396

citations

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

Chunyu Wei, Wenji Hu, Xingjia Hao et al.

NEURIPS 2025arXiv:2511.00457

citations

GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Lang Qin, Ziming Wang, Runhao Jiang et al.

AAAI 2025paperarXiv:2404.15597

citations

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Tong Wei, Yijun Yang, Junliang Xing et al.

ICCV 2025arXiv:2503.08525

citations

HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving

Zhiwen Chen, Hanming Deng, Zhuoren Li et al.

NEURIPS 2025arXiv:2505.15793

citations

Heterogeneous Graph Transformers for Simultaneous Mobile Multi-Robot Task Allocation and Scheduling under Temporal Constraints

Batuhan Altundas, Shengkang Chen, Shivika Singh et al.

NEURIPS 2025oral

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Nick Hansen, Jyothir S V, Vlad Sobal et al.

ICLR 2025arXiv:2405.18418

citations

How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning

Max Weltevrede, Moritz Zanger, Matthijs Spaan et al.

NEURIPS 2025arXiv:2505.16581

Hybrid Latent Reasoning via Reinforcement Learning

Zhenrui Yue, Bowen Jin, Huimin Zeng et al.

NEURIPS 2025arXiv:2505.18454

citations

HYPRL: Reinforcement Learning of Control Policies for Hyperproperties

Tzu-Han Hsu, Arshia Rafieioskouei, Borzoo Bonakdarpour

NEURIPS 2025arXiv:2504.04675

citations

Improving Monte Carlo Tree Search for Symbolic Regression

Zhengyao Huang, Daniel Huang, Tiannan Xiao et al.

NEURIPS 2025arXiv:2509.15929

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Yinlam Chow, Guy Tennenholtz, Izzeddin Gur et al.

ICLR 2025arXiv:2412.15287

citations

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Cong Lu, Shengran Hu, Jeff Clune

ICLR 2025arXiv:2405.15143

citations

Intelligent OPC Engineer Assistant for Semiconductor Manufacturing

Guojin Chen, Haoyu Yang, Bei Yu et al.

AAAI 2025paperarXiv:2408.12775

citations

Iterative Foundation Model Fine-Tuning on Multiple Rewards

Pouya M. Ghari, simone sciabola, Ye Wang

NEURIPS 2025arXiv:2511.00220

Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning

Kaihang Pan, Yang Wu, Wendong Bu et al.

NEURIPS 2025arXiv:2506.01480

citations

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

Michael Matthews, Michael Beukman, Chris Lu et al.

ICLR 2025arXiv:2410.23208

citations

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Seanie Lee, Minsu Kim, Lynn Cherif et al.

ICLR 2025arXiv:2405.18540

citations

Learning mirror maps in policy mirror descent

Carlo Alfano, Sebastian Towers, Silvia Sapora et al.

ICLR 2025arXiv:2402.05187

citations

Learning to Clean: Reinforcement Learning for Noisy Label Correction

Marzi Heidari, Hanping Zhang, Yuhong Guo

NEURIPS 2025arXiv:2511.19808

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata

COLM 2025paper

citations

Learning to Reuse Policies in State Evolvable Environments

Ziqian Zhang, Bohan Yang, Lihe Li et al.

ICML 2025oral

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Max Wilcoxson, Qiyang Li, Kevin Frans et al.

ICML 2025arXiv:2410.18076

citations

LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning

Zhuorui Ye, Stephanie Milani, Geoff Gordon et al.

ICLR 2025arXiv:2407.15786

citations

LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models

Qianyue Hao, Yiwen Song, Qingmin Liao et al.

NEURIPS 2025spotlightarXiv:2505.15293

citations

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Haoxian Chen, Hanyang Zhao, Henry Lam et al.

ICLR 2025arXiv:2405.14953

citations

MALT: Improving Reasoning with Multi-Agent LLM Training

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.

COLM 2025paperarXiv:2412.01928

citations

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Gunshi Gupta, Karmesh Yadav, Zsolt Kira et al.

NEURIPS 2025spotlightarXiv:2510.19732

Meta-learning how to Share Credit among Macro-Actions

Ionel-Alexandru Hosu, Traian Rebedea, Razvan Pascanu

NEURIPS 2025oralarXiv:2506.13690

MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility

Wayne Wu, Honglin He, Jack He et al.

ICLR 2025arXiv:2407.08725

citations

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Xi Chen, Mingkang Zhu, Shaoteng Liu et al.

NEURIPS 2025arXiv:2506.22434

Modelling the control of offline processing with reinforcement learning

Eleanor Spens, Neil Burgess, Tim Behrens

NEURIPS 2025

MRO: Enhancing Reasoning in Diffusion Language Models via Multi-Reward Optimization

Chenglong Wang, Yang Gan, Hang Zhou et al.

NEURIPS 2025arXiv:2510.21473

citations

Multi-Agent Collaboration via Evolving Orchestration

Yufan Dang, Chen Qian, Xueheng Luo et al.

NEURIPS 2025arXiv:2505.19591

citations

MURKA: Multi-Reward Reinforcement Learning with Knowledge Alignment for Optimization Tasks

WANTONG XIE, Yi-Xiang Hu, Jieyang Xu et al.

NEURIPS 2025

Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning

Chenjie Hao, Weyl Lu, Yifan Xu et al.

CVPR 2025arXiv:2504.07095

citations

Neuroplastic Expansion in Deep Reinforcement Learning

Jiashun Liu, Johan S Obando Ceron, Aaron Courville et al.

ICLR 2025arXiv:2410.07994

citations

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

Chenglu Sun, Shuo Shen, Wenzhi Tao et al.

AAAI 2025paperarXiv:2501.01085

citations

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

Longtian Qiu, Shan Ning, Jiaxuan Sun et al.

NEURIPS 2025arXiv:2510.21122

citations

No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes

Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.

NEURIPS 2025oralarXiv:2510.20725

Normalizing Flows are Capable Models for Continuous Control

Raj Ghugare, Benjamin Eysenbach

NEURIPS 2025

← Previous

1 2 3 4...6