Most Cited ICLR 2025 by Maxwell Lin Papers
2 papers found
Conference
#1
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian et al.
ICLR 2025posterarXiv:2410.09024
127
citations
#2
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.
ICLR 2025posterarXiv:2408.00761
108
citations