"reinforcement learning optimization" Papers
5 papers found
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models
Yu Zhou, Xingyu Wu, Jibin Wu et al.
NeurIPS 2025spotlightarXiv:2409.18893
6
citations
Self-Verifying Reflection Helps Transformers with CoT Reasoning
Zhongwei Yu, Wannian Xia, Xue Yan et al.
NeurIPS 2025posterarXiv:2510.12157
1
citations
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
Weize Chen, Jiarui yuan, Jin Tailin et al.
NeurIPS 2025posterarXiv:2505.19217
4
citations
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang et al.
NeurIPS 2025posterarXiv:2505.14631
35
citations
ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance
Liwen Sun, Abhineet Agarwal, Aaron Kornblith et al.
ICML 2024poster