2024 Poster "safety alignment" Papers
4 papers found
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei, Kaixuan Huang, Yangsibo Huang et al.
ICML 2024posterarXiv:2402.05162
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.
ICML 2024posterarXiv:2403.15447
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
Ziyang Zhang, Qizhen Zhang, Jakob Foerster
ICML 2024posterarXiv:2405.07932
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.
ICML 2024posterarXiv:2402.02207