2024 "safety evaluation" Papers
3 papers found
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi, Alexander Wei, Eric Wallace et al.
ICML 2024posterarXiv:2406.20053
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu, Yichen Zhu, Jindong Gu et al.
ECCV 2024posterarXiv:2311.17600
183
citations
Position: TrustLLM: Trustworthiness in Large Language Models
Yue Huang, Lichao Sun, Haoran Wang et al.
ICML 2024poster