2025 "human feedback" Papers
3 papers found
InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback
Boyuan Chen, Donghai Hong, Jiaming Ji et al.
NEURIPS 2025spotlightarXiv:2505.23950
1
citations
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
ICLR 2025posterarXiv:2405.14953
15
citations
Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling
Nguyen Phuc, Ngoc-Hieu Nguyen, Duy M. H. Nguyen et al.
NEURIPS 2025posterarXiv:2506.08681