Poster "direct preference optimization" Papers
54 papers found • Page 1 of 2
Conference
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Ziyin Zhou, Yunpeng Luo, Yuanchen Wu et al.
Aligning Compound AI Systems via System-level DPO
Xiangwen Wang, Yibo Jacky Zhang, Zhoujie Ding et al.
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
Xin Mao, Huimin Xu, Feng-Lin Li et al.
Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models
Yongjin Yang, Sihyeon Kim, Hojung Jung et al.
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
Chen Cheng, Jiacheng Wei, Tianrun Chen et al.
Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
Shawn Im, Sharon Li
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.
Do LVLMs Truly Understand Video Anomalies? Revealing Hallucination via Co-Occurrence Patterns
Menghao Zhang, Huazheng Wang, Pengfei Ren et al.
DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving
Shuyao Shang, Yuntao Chen, Yuqi Wang et al.
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Huaisheng Zhu, Teng Xiao, Vasant Honavar
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation
Aishik Konwer, Zhijian Yang, Erhan Bas et al.
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fanrui Zhang, Dian Li, Qiang Zhang et al.
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
Pengfei Zhao, Rongbo Luan, Wei Zhang et al.
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
Bo Wang, Qinyuan Cheng, Runyu Peng et al.
Improving Video Generation with Human Feedback
Jie Liu, Gongye Liu, Jiajun Liang et al.
Learning Dynamics of LLM Finetuning
YI REN, Danica Sutherland
LeVo: High-Quality Song Generation with Multi-Preference Alignment
Shun Lei, Yaoxun XU, ZhiweiLin et al.
Measuring memorization in RLHF for code completion
Jamie Hayes, I Shumailov, Billy Porter et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Xiaochuan Li, Zichun Yu, Chenyan Xiong
Multi-step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai, Zengjie Hu, Fupeng Sun et al.
Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization
Subhojyoti Mukherjee, Viet Lai, Raghavendra Addanki et al.
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen, Guangyu Yang, Weizhe Lin et al.
Online Preference Alignment for Language Models via Count-based Exploration
Chenjia Bai, Yang Zhang, Shuang Qiu et al.
OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis
Run Luo, Ting-En Lin, Haonan Zhang et al.
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Weilong Dai, Jinlong Liu et al.
Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization
Connor Dunlop, Matthew Zheng, Kavana Venkatesh et al.
Preference Optimization by Estimating the Ratio of the Data Distribution
Yeongmin Kim, HeeSun Bae, Byeonghu Na et al.
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
Yatai Ji, Jiacheng Zhang, Jie Wu et al.
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao, Genta Winata, Anirban Das et al.
Risk-aware Direct Preference Optimization under Nested Risk Measure
Lijun Zhang, Lin Li, Yajie Qi et al.
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang, Jiaxin Song, Yifeng Gao et al.
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.
Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models
Liang Peng, Boxi Wu, Haoran Cheng et al.
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
Systematic Reward Gap Optimization for Mitigating VLM Hallucinations
Lehan He, Zeren Chen, Zhelun Shi et al.
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo, Lu Yin, Bo Jiang et al.
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
Active Preference Learning for Large Language Models
William Muldrew, Peter Hayes, Mingtian Zhang et al.
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee, Xiaoyan Bai, Itamar Pres et al.
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
Gaurav Pandey, Yatin Nandwani, Tahira Naseem et al.
GRATH: Gradual Self-Truthifying for Large Language Models
Weixin Chen, Dawn Song, Bo Li
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
Wei Xiong, Hanze Dong, Chenlu Ye et al.