NeurIPS Poster "direct preference optimization" Papers

10 papers found

Aligning Compound AI Systems via System-level DPO

Xiangwen Wang, Yibo Jacky Zhang, Zhoujie Ding et al.

NeurIPS 2025posterarXiv:2502.17721
2
citations

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NeurIPS 2025posterarXiv:2505.17017
25
citations

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning

Fanrui Zhang, Dian Li, Qiang Zhang et al.

NeurIPS 2025posterarXiv:2505.16836
4
citations

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

Bo Wang, Qinyuan Cheng, Runyu Peng et al.

NeurIPS 2025posterarXiv:2507.00018
14
citations

LeVo: High-Quality Song Generation with Multi-Preference Alignment

Shun Lei, Yaoxun XU, ZhiweiLin et al.

NeurIPS 2025posterarXiv:2506.07520
15
citations

Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization

Subhojyoti Mukherjee, Viet Lai, Raghavendra Addanki et al.

NeurIPS 2025posterarXiv:2506.06964
2
citations

On Extending Direct Preference Optimization to Accommodate Ties

Jinghong Chen, Guangyu Yang, Weizhe Lin et al.

NeurIPS 2025posterarXiv:2409.17431
5
citations

OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis

Run Luo, Ting-En Lin, Haonan Zhang et al.

NeurIPS 2025poster

Risk-aware Direct Preference Optimization under Nested Risk Measure

Lijun Zhang, Lin Li, Yajie Qi et al.

NeurIPS 2025posterarXiv:2505.20359
1
citations

SafeVid: Toward Safety Aligned Video Large Multimodal Models

Yixu Wang, Jiaxin Song, Yifeng Gao et al.

NeurIPS 2025posterarXiv:2505.11926
3
citations