2025 "safety evaluation" Papers
4 papers found
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
Yi Zeng, Yu Yang, Andy Zhou et al.
ICLR 2025poster
Causal Composition Diffusion Model for Closed-loop Traffic Generation
Haohong Lin, Xin Huang, Tung Phan-Minh et al.
CVPR 2025posterarXiv:2412.17920
11
citations
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
Hao Di, Tong He, Haishan Ye et al.
ICLR 2025poster
2
citations
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer, Micah Goldblum, Teresa Datta et al.
ICLR 2025posterarXiv:2409.15268
27
citations