"scalable oversight" Papers
3 papers found
Preference Learning with Lie Detectors can Induce Honesty or Evasion
Chris Cundy, Adam Gleave
NeurIPS 2025posterarXiv:2505.13787
4
citations
Scaling Laws For Scalable Oversight
Joshua Engels, David Baek, Subhash Kantamneni et al.
NeurIPS 2025spotlightarXiv:2504.18530
4
citations
Assessing Large Language Models on Climate Information
Jannis Bulian, Mike Schäfer, Afra Amini et al.
ICML 2024poster