NeurIPS Poster "policy optimization" Papers
16 papers found
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
NeurIPS 2025posterarXiv:2505.20686
12
citations
A Differential and Pointwise Control Approach to Reinforcement Learning
Minh Nguyen, Chandrajit Bajaj
NeurIPS 2025posterarXiv:2404.15617
1
citations
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
Zhihang Lin, Mingbao Lin, Yuan Xie et al.
NeurIPS 2025posterarXiv:2503.22342
47
citations
EconGym: A Scalable AI Testbed with Diverse Economic Tasks
Qirui Mi, Qipeng Yang, Zijun Fan et al.
NeurIPS 2025posterarXiv:2506.12110
4
citations
EvolvedGRPO: Unlocking Reasoning in LVLMs via Progressive Instruction Evolution
Zhebei Shen, Qifan Yu, Juncheng Li et al.
NeurIPS 2025poster
How to Train Your LLM Web Agent: A Statistical Diagnosis
Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza et al.
NeurIPS 2025posterarXiv:2507.04103
4
citations
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
NeurIPS 2025posterarXiv:2505.13031
18
citations
Non-convex entropic mean-field optimization via Best Response flow
Razvan-Andrei Lascu, Mateusz Majka
NeurIPS 2025posterarXiv:2505.22760
1
citations
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Jiacai Liu, Wenye Li, Dachao Lin et al.
NeurIPS 2025posterarXiv:2311.01104
4
citations
On the Sample Complexity of Differentially Private Policy Optimization
Yi He, Xingyu Zhou
NeurIPS 2025posterarXiv:2510.21060
Progress Reward Model for Reinforcement Learning via Large Language Models
Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.
NeurIPS 2025poster
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Simon Matrenok, Skander Moalla, Caglar Gulcehre
NeurIPS 2025posterarXiv:2507.08068
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
Yiyang Zhou, Yangfan He, Yaofeng Su et al.
NeurIPS 2025posterarXiv:2506.01300
28
citations
Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model
Yicong Chen, Jiahua Rao, Jiancong Xie et al.
NeurIPS 2025poster
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao, Chenlu Ye, Quanquan Gu et al.
NeurIPS 2025posterarXiv:2411.04625
14
citations
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Lei Wang et al.
NeurIPS 2025posterarXiv:2501.04686
29
citations