"preference optimization" Papers

18 papers found

Aligning Visual Contrastive learning models via Preference Optimization

Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.

ICLR 2025posterarXiv:2411.08923
3
citations

CPO: Condition Preference Optimization for Controllable Image Generation

Zonglin Lyu, Ming Li, Xinxin Liu et al.

NeurIPS 2025posterarXiv:2511.04753

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025posterarXiv:2411.18203
30
citations

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.

ICCV 2025posterarXiv:2509.26231
1
citations

Learning from negative feedback, or positive feedback or both

Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari et al.

ICLR 2025posterarXiv:2410.04166
7
citations

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Zhenpeng Huang, Jiaqi Li, zihan jia et al.

NeurIPS 2025poster

Meta-Learning Objectives for Preference Optimization

Carlo Alfano, Silvia Sapora, Jakob Foerster et al.

NeurIPS 2025posterarXiv:2411.06568
2
citations

On Extending Direct Preference Optimization to Accommodate Ties

Jinghong Chen, Guangyu Yang, Weizhe Lin et al.

NeurIPS 2025posterarXiv:2409.17431
5
citations

SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins

Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh et al.

ICLR 2025poster
5
citations

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

Teng Xiao, Yige Yuan, Zhengyu Chen et al.

ICLR 2025posterarXiv:2502.00883
23
citations

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Ling Yang, Zhaochen Yu, Tianjun Zhang et al.

ICLR 2025posterarXiv:2410.09008
12
citations

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

Gokul Swamy, Christoph Dann, Rahul Kidambi et al.

ICML 2024poster

Can AI Assistants Know What They Don't Know?

Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu et al.

ICML 2024poster

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Yunhao Tang, Zhaohan Guo, Zeyu Zheng et al.

ICML 2024poster

Human Alignment of Large Language Models through Online Preference Optimisation

Daniele Calandriello, Zhaohan Guo, REMI MUNOS et al.

ICML 2024poster

Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

Songtao Liu, Hanjun Dai, Yue Zhao et al.

ICML 2024poster

RLVF: Learning from Verbal Feedback without Overgeneralization

Moritz Stephan, Alexander Khazatsky, Eric Mitchell et al.

ICML 2024poster

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.

ICML 2024poster