2025 "reinforcement learning" Papers

181 papers found • Page 4 of 4

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Jiaqi Chen, Bang Zhang, Ruotian Ma et al.

NEURIPS 2025posterarXiv:2504.19162
21
citations

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Peixian Ma, Xialie Zhuang, Chengjin Xu et al.

NEURIPS 2025posterarXiv:2504.08600
46
citations

Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation

Eliot Xing, Vernon Luk, Jean Oh

ICLR 2025posterarXiv:2412.12089
11
citations

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning

Hung Le, Dung Nguyen, Kien Do et al.

ICLR 2025posterarXiv:2410.10132
6
citations

Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models

Hoang Khoi Nguyen Do, Truc Nguyen, Malik Hassanaly et al.

ICLR 2025posterarXiv:2503.06413
2
citations

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Junteng Liu, Yuanxiang Fan, Jiang Zhuo et al.

NEURIPS 2025posterarXiv:2505.19641
22
citations

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Dongzhi JIANG, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025posterarXiv:2505.00703
97
citations

Teaching Models to Improve on Tape

Liat Bezalel, Eyal Orgad, Amir Globerson

AAAI 2025paperarXiv:2411.01483

Temporal Difference Learning: Why It Can Be Fast and How It Will Be Faster

Patrick Schnell, Luca Guastoni, Nils Thuerey

ICLR 2025oral

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang

NEURIPS 2025oralarXiv:2401.07844
13
citations

The Promise of RL for Autoregressive Image Editing

Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.

NEURIPS 2025posterarXiv:2508.01119
2
citations

The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

Xinyu Zhu, Mengzhou Xia, Zhepei Wei et al.

NEURIPS 2025posterarXiv:2506.01347
74
citations

Thinker: Learning to Think Fast and Slow

Stephen Chung, Wenyu Du, Jie Fu

NEURIPS 2025posterarXiv:2505.21097
7
citations

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding

Ye Wang, Ziheng Wang, Boshen Xu et al.

NEURIPS 2025oralarXiv:2503.13377
42
citations

Training-Free Generation of Temporally Consistent Rewards from VLMs

Yinuo Zhao, Jiale Yuan, Zhiyuan Xu et al.

ICCV 2025posterarXiv:2507.04789
2
citations

Training Language Models to Self-Correct via Reinforcement Learning

Aviral Kumar, Vincent Zhuang, Rishabh Agarwal et al.

ICLR 2025posterarXiv:2409.12917
318
citations

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Xiaoyuan Liu, Tian Liang, Zhiwei He et al.

NEURIPS 2025posterarXiv:2505.13445
16
citations

Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound

Tal Fiskus, Uri Shaham

NEURIPS 2025posterarXiv:2507.11269

Unified Reinforcement and Imitation Learning for Vision-Language Models

Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro et al.

NEURIPS 2025posterarXiv:2510.19307
2
citations

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wenbo Wang, Fangyun Wei, Lei Zhou et al.

CVPR 2025posterarXiv:2412.02699
16
citations

Uni-RL: Unifying Online and Offline RL via Implicit Value Regularization

Haoran Xu, Liyuan Mao, Hui Jin et al.

NEURIPS 2025poster

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NEURIPS 2025posterarXiv:2501.04686
29
citations

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Ziang Yan, Yinan He, Xinhao Li et al.

NEURIPS 2025oralarXiv:2509.21100
13
citations

VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang, Xiufeng Song, Heng Zhou et al.

NEURIPS 2025posterarXiv:2506.09049
8
citations

Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning

wang lin, Wentao Hu, Liyu Jia et al.

NEURIPS 2025poster

VisRL: Intention-Driven Visual Perception via Reinforced Reasoning

Zhangquan Chen, Xufang Luo, Dongsheng Li

ICCV 2025posterarXiv:2503.07523
24
citations

ViUniT: Visual Unit Tests for More Robust Visual Programming

Artemis Panagopoulou, Honglu Zhou, silvio savarese et al.

CVPR 2025posterarXiv:2412.08859
2
citations

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Haozhe Wang, Chao Qu, Zuming Huang et al.

NEURIPS 2025spotlightarXiv:2504.08837
175
citations

VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning

Qingtao Liu, Yu Cui, Zhengnan Sun et al.

ICLR 2025poster
11
citations

WebDancer: Towards Autonomous Information Seeking Agency

Jialong Wu, Baixuan Li, Runnan Fang et al.

NEURIPS 2025posterarXiv:2505.22648
81
citations

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo, Qingfeng Sun, Can Xu et al.

ICLR 2025posterarXiv:2308.09583
644
citations