"safety evaluation" Papers
4 papers found
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
Yi Zeng, Yu Yang, Andy Zhou et al.
ICLR 2025poster
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi, Alexander Wei, Eric Wallace et al.
ICML 2024poster
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu, Yichen Zhu, Jindong Gu et al.
ECCV 2024posterarXiv:2311.17600
183
citations
Position: TrustLLM: Trustworthiness in Large Language Models
Yue Huang, Lichao Sun, Haoran Wang et al.
ICML 2024poster