"robustness" Papers
4 papers found
Conference
Adversarial Training of Reward Models
Alexander Bukharin, Haifeng Qian, Shengyang Sun et al.
COLM 2025paperarXiv:2504.06141
7
citations
Fluid Language Model Benchmarking
Valentin Hofmann, David Heineman, Ian Magnusson et al.
COLM 2025paperarXiv:2509.11106
10
citations
RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models
Juan Diego Rodriguez, Wenxuan Ding, Katrin Erk et al.
COLM 2025paper
2
citations
Environment Design for Inverse Reinforcement Learning
Thomas Kleine Buening, Victor Villin, Christos Dimitrakakis
ICML 2024arXiv:2210.14972
4
citations