NeurIPS 2025 "policy optimization" Papers

12 papers found

Filters:NeurIPS 2025 policy optimization Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NeurIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

A Differential and Pointwise Control Approach to Reinforcement Learning

Minh Nguyen, Chandrajit Bajaj

NeurIPS 2025posterarXiv:2404.15617

citations

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Zhihang Lin, Mingbao Lin, Yuan Xie et al.

NeurIPS 2025posterarXiv:2503.22342

citations

EconGym: A Scalable AI Testbed with Diverse Economic Tasks

Qirui Mi, Qipeng Yang, Zijun Fan et al.

NeurIPS 2025posterarXiv:2506.12110

citations

EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution

Zhebei Shen, Qifan Yu, Juncheng Li et al.

NeurIPS 2025poster

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Yicheng Xiao, Lin Song, Yukang Chen et al.

NeurIPS 2025posterarXiv:2505.13031

citations

Non-convex entropic mean-field optimization via Best Response flow

Razvan-Andrei Lascu, Mateusz Majka

NeurIPS 2025posterarXiv:2505.22760

citations

On the Convergence of Projected Policy Gradient for Any Constant Step Sizes

Jiacai Liu, Wenye Li, Dachao Lin et al.

NeurIPS 2025posterarXiv:2311.01104

citations

Progress Reward Model for Reinforcement Learning via Large Language Models

Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.

NeurIPS 2025poster

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Yiyang Zhou, Yangfan He, Yaofeng Su et al.

NeurIPS 2025posterarXiv:2506.01300

citations

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.

NeurIPS 2025spotlightarXiv:2505.21908

citations

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Heyang Zhao, Chenlu Ye, Quanquan Gu et al.

NeurIPS 2025posterarXiv:2411.04625

citations

Unlocking Multimodal Mathematical Reasoning via Process Reward Model

Ruilin Luo, Zhuofan Zheng, Lei Wang et al.

NeurIPS 2025posterarXiv:2501.04686

citations