NEURIPS "deception detection" Papers
2 papers found
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
NEURIPS 2025spotlightarXiv:2504.04072
7
citations
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
Benjamin Arnav, Pablo Bernabeu-Perez, Nathan Helm-Burger et al.
NEURIPS 2025posterarXiv:2505.23575
12
citations