by Maxwell Lin Papers
4 papers found
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.
ICLR 2025poster
127
citations
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
Andy Zou, Maxwell Lin, Eliot Jones et al.
NeurIPS 2025poster
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.
ICLR 2025posterarXiv:2408.00761
108
citations
Teaching Large Language Models to Self-Debug
Xinyun Chen, Maxwell Lin, Nathanael Schaerli et al.
ICLR 2024poster