Poster "human preference alignment" Papers
16 papers found
Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations
Peng Lai, Jianjie Zheng, Sijie Cheng et al.
NeurIPS 2025posterarXiv:2508.03550
2
citations
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.
ICCV 2025posterarXiv:2503.15265
35
citations
Direct Alignment with Heterogeneous Preferences
Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.
NeurIPS 2025posterarXiv:2502.16320
8
citations
Risk-aware Direct Preference Optimization under Nested Risk Measure
Lijun Zhang, Lin Li, Yajie Qi et al.
NeurIPS 2025posterarXiv:2505.20359
1
citations
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu, Wei Xiong, Jie Ren et al.
ICLR 2025posterarXiv:2409.13156
44
citations
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.
ICLR 2025posterarXiv:2410.08847
47
citations
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
ICLR 2025posterarXiv:2410.18640
14
citations
WorldModelBench: Judging Video Generation Models As World Models
Dacheng Li, Yunhao Fang, Yukang Chen et al.
NeurIPS 2025posterarXiv:2502.20694
31
citations
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.
ICML 2024poster
DreamReward: Aligning Human Preference in Text-to-3D Generation
junliang ye, Fangfu Liu, Qixiu Li et al.
ECCV 2024poster
MaxMin-RLHF: Alignment with Diverse Human Preferences
Souradip Chakraborty, Jiahao Qiu, Hui Yuan et al.
ICML 2024poster
MusicRL: Aligning Music Generation to Human Preferences
Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.
ICML 2024posterarXiv:2301.11325
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
Rui Yang, Xiaoman Pan, Feng Luo et al.
ICML 2024poster
Self-Rewarding Language Models
Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho et al.
ICML 2024poster
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning
Zongmeng Zhang, Yufeng Shi, Jinhua Zhu et al.
ICML 2024poster
Understanding the Learning Dynamics of Alignment with Human Feedback
Shawn Im, Sharon Li
ICML 2024poster