2025 "preference optimization" Papers
24 papers found
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Anh Tuan Luu, Yue Liu et al.
Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.
Avoiding exp(R) scaling in RLHF through Preference-based Exploration
Mingyu Chen, Yiding Chen, Wen Sun et al.
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.
CPO: Condition Preference Optimization for Controllable Image Generation
Zonglin Lyu, Ming Li, Xinxin Liu et al.
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
Data Distillation for extrapolative protein design through exact preference optimization
Mostafa Karimi, Sharmi Banerjee, Tommi Jaakkola et al.
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.
Learning from negative feedback, or positive feedback or both
Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari et al.
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization
Zhenpeng Huang, Jiaqi Li, zihan jia et al.
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
Meta-Learning Objectives for Preference Optimization
Carlo Alfano, Silvia Sapora, Jakob Foerster et al.
Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling
Nguyen Phuc, Ngoc-Hieu Nguyen, Duy M. H. Nguyen et al.
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen, Guangyu Yang, Weizhe Lin et al.
Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback
Yi-Lun Wu, Bo-Kai Ruan, Chiang Tseng et al.
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar, Ali Etemad
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh et al.
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan, Zhengyu Chen et al.
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
Ling Yang, Zhaochen Yu, Tianjun Zhang et al.
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Zichen Miao, Zhengyuan Yang, Kevin Lin et al.
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
Weighted-Reward Preference Optimization for Implicit Model Fusion
Ziyi Yang, Fanqi Wan, Longguang Zhong et al.