2025 Poster "policy optimization" Papers
9 papers found
$q$-exponential family for policy optimization
Lingwei Zhu, Haseeb Shah, Han Wang et al.
ICLR 2025posterarXiv:2408.07245
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
Zhihang Lin, Mingbao Lin, Yuan Xie et al.
NeurIPS 2025posterarXiv:2503.22342
47
citations
EconGym: A Scalable AI Testbed with Diverse Economic Tasks
Qirui Mi, Qipeng Yang, Zijun Fan et al.
NeurIPS 2025posterarXiv:2506.12110
4
citations
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
NeurIPS 2025posterarXiv:2505.13031
18
citations
Non-convex entropic mean-field optimization via Best Response flow
Razvan-Andrei Lascu, Mateusz Majka
NeurIPS 2025posterarXiv:2505.22760
1
citations
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Jiacai Liu, Wenye Li, Dachao Lin et al.
NeurIPS 2025posterarXiv:2311.01104
4
citations
Progress Reward Model for Reinforcement Learning via Large Language Models
Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.
NeurIPS 2025poster
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu, Wei Xiong, Jie Ren et al.
ICLR 2025posterarXiv:2409.13156
44
citations
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Lei Wang et al.
NeurIPS 2025posterarXiv:2501.04686
29
citations