Poster "large language models safety" Papers
2 papers found
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
Tiansheng Huang, Sihao Hu, Fatih Ilhan et al.
ICLR 2025posterarXiv:2409.01586
57
citations
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen, Dongcheng Zhao, Yiting Dong et al.
ICLR 2025posterarXiv:2410.02298
11
citations