Poster "ai safety" Papers
11 papers found
A Black Swan Hypothesis: The Role of Human Irrationality in AI Safety
Hyunin Lee, Chanwoo Park, David Abel et al.
ICLR 2025posterarXiv:2407.18422
4
citations
Combining Cost Constrained Runtime Monitors for AI Safety
Tim Hua, James Baskerville, Henri Lemoine et al.
NeurIPS 2025posterarXiv:2507.15886
8
citations
Position: Require Frontier AI Labs To Release Small "Analog" Models
Shriyash Upadhyay, Philip Quirke, Narmeen Oozeer et al.
NeurIPS 2025poster
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan et al.
ICML 2024poster
Feedback Loops With Language Models Drive In-Context Reward Hacking
Alexander Pan, Erik Jones, Meena Jagadeesan et al.
ICML 2024poster
Fundamental Limitations of Alignment in Large Language Models
Yotam Wolf, Noam Wies, Oshri Avnery et al.
ICML 2024poster
Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
Golnoosh Farnadi, Mohammad Havaei, Negar Rostamzadeh
ICML 2024poster
Position: Explain to Question not to Justify
Przemyslaw Biecek, Wojciech Samek
ICML 2024poster
Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras, Aleksandar Petrov, Bertie Vidgen et al.
ICML 2024poster
Position: Open-Endedness is Essential for Artificial Superhuman Intelligence
Edward Hughes, Michael Dennis, Jack Parker-Holder et al.
ICML 2024poster
Scalable AI Safety via Doubly-Efficient Debate
Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras
ICML 2024poster