Spotlight "reward modeling" Papers
4 papers found
Conference
Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Xiang Kong et al.
NEURIPS 2025spotlightarXiv:2507.18624
32
citations
MJ-Video: Benchmarking and Rewarding Video Generation with Fine-Grained Video Preference
Haibo Tong, Zhaoyang Wang, Zhaorun Chen et al.
NEURIPS 2025spotlight
Reverse Engineering Human Preferences with Reinforcement Learning
Lisa Alazraki, Yi-Chern Tan, Jon Ander Campos et al.
NEURIPS 2025spotlightarXiv:2505.15795
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen, Ruiqi Zhong, Narutatsu Ri et al.
ICML 2024spotlightarXiv:2307.08678
79
citations