"human preference modeling" Papers
4 papers found
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Shiping Gao, Fanqi Wan, Jiajian Guo et al.
ICLR 2025posterarXiv:2502.17927
4
citations
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Model Using Implicit Feedback from Pre-training Demonstrations
Thomas Tian, Kratarth Goel
ICLR 2025posterarXiv:2503.20105
4
citations
Learning Optimal Advantage from Preferences and Mistaking It for Reward
W Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson et al.
AAAI 2024paperarXiv:2310.02456
15
citations
Multimodal Label Relevance Ranking via Reinforcement Learning
Taian Guo, Taolin Zhang, Haoqian Wu et al.
ECCV 2024posterarXiv:2407.13221
1
citations