2025 "reinforcement learning optimization" Papers
6 papers found
DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model
Rui Yu, Xianghang Zhang, Runkai Zhao et al.
ICCV 2025posterarXiv:2508.05402
4
citations
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.
NeurIPS 2025posterarXiv:2505.18531
7
citations
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models
Yu Zhou, Xingyu Wu, Jibin Wu et al.
NeurIPS 2025spotlightarXiv:2409.18893
6
citations
Self-Verifying Reflection Helps Transformers with CoT Reasoning
Zhongwei Yu, Wannian Xia, Xue Yan et al.
NeurIPS 2025posterarXiv:2510.12157
1
citations
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
Weize Chen, Jiarui yuan, Jin Tailin et al.
NeurIPS 2025posterarXiv:2505.19217
4
citations
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang et al.
NeurIPS 2025posterarXiv:2505.14631
35
citations