"safety benchmark" Papers
3 papers found
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
Thomas Kuntz, Agatha Duzan, Hao Zhao et al.
NeurIPS 2025spotlightarXiv:2506.14866
18
citations
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang, Jiaxin Song, Yifeng Gao et al.
NeurIPS 2025posterarXiv:2505.11926
3
citations
SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts
Yueh-Han Chen, Guy Davidson, Brenden Lake
NeurIPS 2025spotlightarXiv:2505.21828
1
citations