2025 Poster "fine-tuning attacks" Papers
3 papers found
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
Qinfeng Li, Tianyue Luo, Xuhong Zhang et al.
NEURIPS 2025posterarXiv:2410.13903
7
citations
Safety Alignment Should be Made More Than Just a Few Tokens Deep
Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.
ICLR 2025posterarXiv:2406.05946
287
citations
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.
ICLR 2025posterarXiv:2408.00761
108
citations