NeurIPS 2025 "policy gradient methods" Papers
3 papers found
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.
NeurIPS 2025spotlightarXiv:2504.12216
75
citations
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Jiacai Liu, Wenye Li, Dachao Lin et al.
NeurIPS 2025posterarXiv:2311.01104
4
citations
REINFORCE Converges to Optimal Policies with Any Learning Rate
Samuel Robertson, Thang Chu, Bo Dai et al.
NeurIPS 2025poster