by Eliot Jones Papers
2 papers found
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
Andy K Zhang, Neil Perry, Riya Dulepet et al.
ICLR 2025poster
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
Andy Zou, Maxwell Lin, Eliot Jones et al.
NeurIPS 2025poster