NeurIPS "direct preference optimization" Papers
9 papers found
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
NeurIPS 2025posterarXiv:2505.17017
25
citations
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fanrui Zhang, Dian Li, Qiang Zhang et al.
NeurIPS 2025posterarXiv:2505.16836
4
citations
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
Bo Wang, Qinyuan Cheng, Runyu Peng et al.
NeurIPS 2025posterarXiv:2507.00018
14
citations
LeVo: High-Quality Song Generation with Multi-Preference Alignment
Shun Lei, Yaoxun XU, ZhiweiLin et al.
NeurIPS 2025posterarXiv:2506.07520
15
citations
Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization
Subhojyoti Mukherjee, Viet Lai, Raghavendra Addanki et al.
NeurIPS 2025posterarXiv:2506.06964
2
citations
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen, Guangyu Yang, Weizhe Lin et al.
NeurIPS 2025posterarXiv:2409.17431
5
citations
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis
Run Luo, Ting-En Lin, Haonan Zhang et al.
NeurIPS 2025poster
Risk-aware Direct Preference Optimization under Nested Risk Measure
Lijun Zhang, Lin Li, Yajie Qi et al.
NeurIPS 2025posterarXiv:2505.20359
1
citations
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang, Jiaxin Song, Yifeng Gao et al.
NeurIPS 2025posterarXiv:2505.11926
3
citations