Poster "direct preference optimization" Papers
30 papers found
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Ziyin Zhou, Yunpeng Luo, Yuanchen Wu et al.
Aligning Compound AI Systems via System-level DPO
Xiangwen Wang, Yibo Jacky Zhang, Zhoujie Ding et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Huaisheng Zhu, Teng Xiao, Vasant Honavar
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fanrui Zhang, Dian Li, Qiang Zhang et al.
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
Bo Wang, Qinyuan Cheng, Runyu Peng et al.
LeVo: High-Quality Song Generation with Multi-Preference Alignment
Shun Lei, Yaoxun XU, ZhiweiLin et al.
Measuring memorization in RLHF for code completion
Jamie Hayes, I Shumailov, Billy Porter et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization
Subhojyoti Mukherjee, Viet Lai, Raghavendra Addanki et al.
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen, Guangyu Yang, Weizhe Lin et al.
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis
Run Luo, Ting-En Lin, Haonan Zhang et al.
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Weilong Dai, Jinlong Liu et al.
Risk-aware Direct Preference Optimization under Nested Risk Measure
Lijun Zhang, Lin Li, Yajie Qi et al.
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang, Jiaxin Song, Yifeng Gao et al.
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.
Active Preference Learning for Large Language Models
William Muldrew, Peter Hayes, Mingtian Zhang et al.
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee, Xiaoyan Bai, Itamar Pres et al.
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
Gaurav Pandey, Yatin Nandwani, Tahira Naseem et al.
GRATH: Gradual Self-Truthifying for Large Language Models
Weixin Chen, Dawn Song, Bo Li
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
Wei Xiong, Hanze Dong, Chenlu Ye et al.
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.
Token-level Direct Preference Optimization
Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.
Towards Efficient Exact Optimization of Language Model Alignment
Haozhe Ji, Cheng Lu, Yilin Niu et al.