Paper "direct preference optimization" Papers
5 papers found
Conference
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang, Dading Chong, Feng Jiang et al.
AAAI 2025paperarXiv:2409.13948
6
citations
Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution
Wentao Tan, Qiong Cao, Yibing Zhan et al.
AAAI 2025paperarXiv:2412.15650
7
citations
In-context Ranking Preference Optimization
Junda Wu, Rohan Surana, Zhouhang Xie et al.
COLM 2025paperarXiv:2504.15477
3
citations
ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO
Daechul Ahn, Yura Choi, San Kim et al.
AAAI 2025paperarXiv:2406.11280
3
citations
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
AAAI 2024paperarXiv:2308.06394
264
citations