NEURIPS Poster "reinforcement learning optimization" Papers
4 papers found
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.
NEURIPS 2025posterarXiv:2505.18531
7
citations
Self-Verifying Reflection Helps Transformers with CoT Reasoning
Zhongwei Yu, Wannian Xia, Xue Yan et al.
NEURIPS 2025posterarXiv:2510.12157
1
citations
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
Weize Chen, Jiarui yuan, Jin Tailin et al.
NEURIPS 2025posterarXiv:2505.19217
4
citations
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang et al.
NEURIPS 2025posterarXiv:2505.14631
35
citations