2025 Poster "preference optimization" Papers

28 papers found

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Mingzhe Du, Anh Tuan Luu, Yue Liu et al.

NeurIPS 2025posterarXiv:2505.23387
6
citations

Aligning Visual Contrastive learning models via Preference Optimization

Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.

ICLR 2025posterarXiv:2411.08923
3
citations

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.

ICLR 2025posterarXiv:2410.18252
39
citations

Avoiding exp(R) scaling in RLHF through Preference-based Exploration

Mingyu Chen, Yiding Chen, Wen Sun et al.

NeurIPS 2025poster
3
citations

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.

ICCV 2025posterarXiv:2501.02135
9
citations

CPO: Condition Preference Optimization for Controllable Image Generation

Zonglin Lyu, Ming Li, Xinxin Liu et al.

NeurIPS 2025posterarXiv:2511.04753

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Di Zhang, Jingdi Lei, Junxian Li et al.

CVPR 2025posterarXiv:2411.18203
30
citations

Data Distillation for extrapolative protein design through exact preference optimization

Mostafa Karimi, Sharmi Banerjee, Tommi Jaakkola et al.

ICLR 2025poster
1
citations

Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

Sohyun An, Ruochen Wang, Tianyi Zhou et al.

NeurIPS 2025poster
11
citations

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.

ICCV 2025posterarXiv:2509.26231
1
citations

Learning from negative feedback, or positive feedback or both

Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari et al.

ICLR 2025posterarXiv:2410.04166
7
citations

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

ICLR 2025posterarXiv:2408.15881
34
citations

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Zhenpeng Huang, Jiaqi Li, zihan jia et al.

NeurIPS 2025poster

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.

ICLR 2025posterarXiv:2406.08464
261
citations

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Haoxian Chen, Hanyang Zhao, Henry Lam et al.

ICLR 2025posterarXiv:2405.14953
15
citations

Meta-Learning Objectives for Preference Optimization

Carlo Alfano, Silvia Sapora, Jakob Foerster et al.

NeurIPS 2025posterarXiv:2411.06568
2
citations

Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling

Nguyen Phuc, Ngoc-Hieu Nguyen, Duy M. H. Nguyen et al.

NeurIPS 2025posterarXiv:2506.08681

On Extending Direct Preference Optimization to Accommodate Ties

Jinghong Chen, Guangyu Yang, Weizhe Lin et al.

NeurIPS 2025posterarXiv:2409.17431
5
citations

Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization

Abhishek Roy, Geelon So, Yian Ma

NeurIPS 2025poster
11
citations

Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback

Yi-Lun Wu, Bo-Kai Ruan, Chiang Tseng et al.

NeurIPS 2025posterarXiv:2510.18353

SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins

Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh et al.

ICLR 2025poster
5
citations

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

Teng Xiao, Yige Yuan, Zhengyu Chen et al.

ICLR 2025posterarXiv:2502.00883
23
citations

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Ling Yang, Zhaochen Yu, Tianjun Zhang et al.

ICLR 2025posterarXiv:2410.09008
12
citations

Token-Level Self-Play with Importance-Aware Guidance for Large Language Models

Tue Le, Hoang Tran, Quyen Tran et al.

NeurIPS 2025poster

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

Zichen Miao, Zhengyuan Yang, Kevin Lin et al.

ICLR 2025posterarXiv:2410.03190
14
citations

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Jihan Yao, Wenxuan Ding, Shangbin Feng et al.

ICLR 2025posterarXiv:2410.11055
4
citations

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.

ICLR 2025posterarXiv:2410.18640
14
citations

Weighted-Reward Preference Optimization for Implicit Model Fusion

Ziyi Yang, Fanqi Wan, Longguang Zhong et al.

ICLR 2025posterarXiv:2412.03187
12
citations