ICLR "preference optimization" Papers
9 papers found
Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.
ICLR 2025posterarXiv:2411.08923
3
citations
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.
ICLR 2025posterarXiv:2410.18252
39
citations
Data Distillation for extrapolative protein design through exact preference optimization
Mostafa Karimi, Sharmi Banerjee, Tommi Jaakkola et al.
ICLR 2025poster
1
citations
Learning from negative feedback, or positive feedback or both
Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari et al.
ICLR 2025posterarXiv:2410.04166
7
citations
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
ICLR 2025posterarXiv:2408.15881
34
citations
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh et al.
ICLR 2025poster
5
citations
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan, Zhengyu Chen et al.
ICLR 2025posterarXiv:2502.00883
23
citations
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
Ling Yang, Zhaochen Yu, Tianjun Zhang et al.
ICLR 2025posterarXiv:2410.09008
12
citations
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
ICLR 2025posterarXiv:2410.18640
14
citations