2025 Poster "model safety" Papers
4 papers found
On Large Language Model Continual Unlearning
Chongyang Gao, Lixu Wang, Kaize Ding et al.
ICLR 2025posterarXiv:2407.10223
26
citations
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang et al.
ICLR 2025posterarXiv:2410.13708
40
citations
Robust LLM safeguarding via refusal feature adversarial training
Lei Yu, Virginie Do, Karen Hambardzumyan et al.
ICLR 2025posterarXiv:2409.20089
VLMs can Aggregate Scattered Training Patches
Zhanhui Zhou, Lingjie Chen, Chao Yang et al.
NeurIPS 2025posterarXiv:2506.03614