"direct preference optimization" Papers

63 papers found • Page 1 of 2

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models

Ziyin Zhou, Yunpeng Luo, Yuanchen Wu et al.

ICCV 2025arXiv:2507.02664
13
citations

Aligning Compound AI Systems via System-level DPO

Xiangwen Wang, Yibo Jacky Zhang, Zhoujie Ding et al.

NEURIPS 2025arXiv:2502.17721
2
citations

Aligning Language Models Using Follow-up Likelihood as Reward Signal

Chen Zhang, Dading Chong, Feng Jiang et al.

AAAI 2025paperarXiv:2409.13948
6
citations

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

Xin Mao, Huimin Xu, Feng-Lin Li et al.

ICLR 2025arXiv:2410.04834
3
citations

Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

Yongjin Yang, Sihyeon Kim, Hojung Jung et al.

ICLR 2025arXiv:2410.10166
2
citations

Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution

Wentao Tan, Qiong Cao, Yibing Zhan et al.

AAAI 2025paperarXiv:2412.15650
7
citations

Boost Your Human Image Generation Model via Direct Preference Optimization

Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee

CVPR 2025highlightarXiv:2405.20216
8
citations

CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images

Chen Cheng, Jiacheng Wei, Tianrun Chen et al.

CVPR 2025arXiv:2504.04753
15
citations

Can DPO Learn Diverse Human Values? A Theoretical Scaling Law

Shawn Im, Sharon Li

NEURIPS 2025arXiv:2408.03459
8
citations

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.

ICLR 2025arXiv:2501.16629
21
citations

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.

CVPR 2025arXiv:2405.13637
24
citations

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.

ICCV 2025arXiv:2503.15265
35
citations

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Chengzhuo Tong, Ziyu Guo, Renrui Zhang et al.

NEURIPS 2025arXiv:2505.17017
26
citations

DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Ziyi Wu, Anil Kag, Ivan Skorokhodov et al.

NEURIPS 2025oralarXiv:2506.03517
14
citations

Do LVLMs Truly Understand Video Anomalies? Revealing Hallucination via Co-Occurrence Patterns

Menghao Zhang, Huazheng Wang, Pengfei Ren et al.

NEURIPS 2025

DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving

Shuyao Shang, Yuntao Chen, Yuqi Wang et al.

NEURIPS 2025arXiv:2509.17940
8
citations

DSPO: Direct Score Preference Optimization for Diffusion Model Alignment

Huaisheng Zhu, Teng Xiao, Vasant Honavar

ICLR 2025
22
citations

Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective

Ruichen Shao, Bei Li, Gangao Liu et al.

ICLR 2025oralarXiv:2502.14340
7
citations

Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation

Aishik Konwer, Zhijian Yang, Erhan Bas et al.

CVPR 2025arXiv:2503.04639
8
citations

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning

Fanrui Zhang, Dian Li, Qiang Zhang et al.

NEURIPS 2025arXiv:2505.16836
4
citations

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.

NEURIPS 2025arXiv:2506.21656
4
citations

Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment

Pengfei Zhao, Rongbo Luan, Wei Zhang et al.

NEURIPS 2025arXiv:2506.06970
1
citations

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

Bo Wang, Qinyuan Cheng, Runyu Peng et al.

NEURIPS 2025arXiv:2507.00018
15
citations

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang et al.

NEURIPS 2025arXiv:2501.13918
127
citations

In-context Ranking Preference Optimization

Junda Wu, Rohan Surana, Zhouhang Xie et al.

COLM 2025paperarXiv:2504.15477
3
citations

ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO

Daechul Ahn, Yura Choi, San Kim et al.

AAAI 2025paperarXiv:2406.11280
3
citations

Learning Dynamics of LLM Finetuning

YI REN, Danica Sutherland

ICLR 2025arXiv:2407.10490
66
citations

Less is More: Improving LLM Alignment via Preference Data Selection

Xun Deng, Han Zhong, Rui Ai et al.

NEURIPS 2025spotlightarXiv:2502.14560

LeVo: High-Quality Song Generation with Multi-Preference Alignment

Shun Lei, Yaoxun XU, ZhiweiLin et al.

NEURIPS 2025arXiv:2506.07520
16
citations

Measuring memorization in RLHF for code completion

Jamie Hayes, I Shumailov, Billy Porter et al.

ICLR 2025arXiv:2406.11715
10
citations

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025arXiv:2410.17637
22
citations

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

Xiaochuan Li, Zichun Yu, Chenyan Xiong

ICLR 2025arXiv:2410.14208
5
citations

Multi-step Visual Reasoning with Visual Tokens Scaling and Verification

Tianyi Bai, Zengjie Hu, Fupeng Sun et al.

NEURIPS 2025arXiv:2506.07235
14
citations

Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization

Subhojyoti Mukherjee, Viet Lai, Raghavendra Addanki et al.

NEURIPS 2025arXiv:2506.06964
3
citations

On Extending Direct Preference Optimization to Accommodate Ties

Jinghong Chen, Guangyu Yang, Weizhe Lin et al.

NEURIPS 2025arXiv:2409.17431
7
citations

Online Preference Alignment for Language Models via Count-based Exploration

Chenjia Bai, Yang Zhang, Shuang Qiu et al.

ICLR 2025arXiv:2501.12735
20
citations

OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis

Run Luo, Ting-En Lin, Haonan Zhang et al.

NEURIPS 2025

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

Qihan Huang, Weilong Dai, Jinlong Liu et al.

CVPR 2025arXiv:2412.03177
10
citations

Personalized Image Editing in Text-to-Image Diffusion Models via Collaborative Direct Preference Optimization

Connor Dunlop, Matthew Zheng, Kavana Venkatesh et al.

NEURIPS 2025arXiv:2511.05616
1
citations

Preference Optimization by Estimating the Ratio of the Data Distribution

Yeongmin Kim, HeeSun Bae, Byeonghu Na et al.

NEURIPS 2025arXiv:2505.19601
2
citations

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Yatai Ji, Jiacheng Zhang, Jie Wu et al.

ICCV 2025arXiv:2412.15156
10
citations

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Hanyang Zhao, Genta Winata, Anirban Das et al.

ICLR 2025arXiv:2410.04203
19
citations

Risk-aware Direct Preference Optimization under Nested Risk Measure

Lijun Zhang, Lin Li, Yajie Qi et al.

NEURIPS 2025arXiv:2505.20359
2
citations

SafeVid: Toward Safety Aligned Video Large Multimodal Models

Yixu Wang, Jiaxin Song, Yifeng Gao et al.

NEURIPS 2025arXiv:2505.11926
4
citations

Scalable Ranked Preference Optimization for Text-to-Image Generation

Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.

ICCV 2025arXiv:2410.18013
23
citations

Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models

Liang Peng, Boxi Wu, Haoran Cheng et al.

NEURIPS 2025

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

Leigang Qu, Haochuan Li, Wenjie Wang et al.

CVPR 2025arXiv:2412.05818
10
citations

Systematic Reward Gap Optimization for Mitigating VLM Hallucinations

Lehan He, Zeren Chen, Zhelun Shi et al.

NEURIPS 2025arXiv:2411.17265
5
citations

TODO: Enhancing LLM Alignment with Ternary Preferences

Yuxiang Guo, Lu Yin, Bo Jiang et al.

ICLR 2025arXiv:2411.02442
5
citations

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.

ICLR 2025arXiv:2410.08847
51
citations
PreviousNext