Spotlight "policy gradient methods" Papers
2 papers found
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao, Devaansh Gupta, Qinqing Zheng et al.
NeurIPS 2025spotlightarXiv:2504.12216
75
citations
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
Alessandro Montenegro, Marco Mussi, Alberto Maria Metelli et al.
ICML 2024spotlight