Poster "preference optimization" Papers
52 papers found • Page 1 of 2
Conference
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement
Hui Yuan, Yifan Zeng, Yue Wu et al.
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Anh Tuan Luu, Yue Liu et al.
Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
Xin Mao, Huimin Xu, Feng-Lin Li et al.
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.
Avoiding exp(R) scaling in RLHF through Preference-based Exploration
Mingyu Chen, Yiding Chen, Wen Sun et al.
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
Kyungmin Lee, Xiaohang Li, Qifei Wang et al.
CPO: Condition Preference Optimization for Controllable Image Generation
Zonglin Lyu, Ming Li, Xinxin Liu et al.
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
Data Distillation for extrapolative protein design through exact preference optimization
Mostafa Karimi, Sharmi Banerjee, Tommi Jaakkola et al.
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Tao Zhang, Cheng Da, Kun Ding et al.
Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An, Ruochen Wang, Tianyi Zhou et al.
Factorized Learning for Temporally Grounded Video-Language Models
Wenzheng Zeng, Difei Gao, Mike Zheng Shou et al.
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.
Learning from negative feedback, or positive feedback or both
Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari et al.
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
Guanzheng Chen, Xin Li, Michael Qizhe Shieh et al.
LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization
Zhenpeng Huang, Jiaqi Li, zihan jia et al.
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Yushi Bai, Jiajie Zhang, Xin Lv et al.
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
Meta-Learning Objectives for Preference Optimization
Carlo Alfano, Silvia Sapora, Jakob Foerster et al.
Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling
Nguyen Phuc, Ngoc-Hieu Nguyen, Duy M. H. Nguyen et al.
nvBench 2.0: Resolving Ambiguity in Text-to-Visualization through Stepwise Reasoning
Tianqi Luo, Chuhan Huang, Leixian Shen et al.
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen, Guangyu Yang, Weizhe Lin et al.
OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
Yixuan Yang, Zhen Luo, Tongsheng Ding et al.
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Qinghao Ye, Xianhan Zeng, Fu Li et al.
Preference Distillation via Value based Reinforcement Learning
Minchan Kwon, Junwon Ko, Kangil kim et al.
Preference Optimization by Estimating the Ratio of the Data Distribution
Yeongmin Kim, HeeSun Bae, Byeonghu Na et al.
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.
Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization
Abhishek Roy, Geelon So, Yian Ma
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao, Genta Winata, Anirban Das et al.
Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback
Yi-Lun Wu, Bo-Kai Ruan, Chiang Tseng et al.
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh et al.
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan, Zhengyu Chen et al.
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
Ling Yang, Zhaochen Yu, Tianjun Zhang et al.
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo, Lu Yin, Bo Jiang et al.
Token-Level Self-Play with Importance-Aware Guidance for Large Language Models
Tue Le, Hoang Tran, Quyen Tran et al.
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Zichen Miao, Zhengyuan Yang, Kevin Lin et al.
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao, Wenxuan Ding, Shangbin Feng et al.
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
Weighted-Reward Preference Optimization for Implicit Model Fusion
Ziyi Yang, Fanqi Wan, Longguang Zhong et al.
Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs
Xudong Li, Mengdan Zhang, Peixian Chen et al.
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy, Christoph Dann, Rahul Kidambi et al.
Can AI Assistants Know What They Don't Know?
Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu et al.
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang, Zhaohan Guo, Zeyu Zheng et al.
Human Alignment of Large Language Models through Online Preference Optimisation
Daniele Calandriello, Zhaohan Guo, REMI MUNOS et al.
Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
Songtao Liu, Hanjun Dai, Yue Zhao et al.