Poster "human preference alignment" Papers

16 papers found

Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations

Peng Lai, Jianjie Zheng, Sijie Cheng et al.

NeurIPS 2025posterarXiv:2508.03550
2
citations

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.

ICCV 2025posterarXiv:2503.15265
35
citations

Direct Alignment with Heterogeneous Preferences

Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.

NeurIPS 2025posterarXiv:2502.16320
8
citations

Risk-aware Direct Preference Optimization under Nested Risk Measure

Lijun Zhang, Lin Li, Yajie Qi et al.

NeurIPS 2025posterarXiv:2505.20359
1
citations

RRM: Robust Reward Model Training Mitigates Reward Hacking

Tianqi Liu, Wei Xiong, Jie Ren et al.

ICLR 2025posterarXiv:2409.13156
44
citations

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.

ICLR 2025posterarXiv:2410.08847
47
citations

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.

ICLR 2025posterarXiv:2410.18640
14
citations

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NeurIPS 2025posterarXiv:2502.20694
31
citations

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.

ICML 2024poster

DreamReward: Aligning Human Preference in Text-to-3D Generation

junliang ye, Fangfu Liu, Qixiu Li et al.

ECCV 2024poster

MaxMin-RLHF: Alignment with Diverse Human Preferences

Souradip Chakraborty, Jiahao Qiu, Hui Yuan et al.

ICML 2024poster

MusicRL: Aligning Music Generation to Human Preferences

Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.

ICML 2024posterarXiv:2301.11325

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rui Yang, Xiaoman Pan, Feng Luo et al.

ICML 2024poster

Self-Rewarding Language Models

Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho et al.

ICML 2024poster

Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning

Zongmeng Zhang, Yufeng Shi, Jinhua Zhu et al.

ICML 2024poster

Understanding the Learning Dynamics of Alignment with Human Feedback

Shawn Im, Sharon Li

ICML 2024poster