Poster "harmful content generation" Papers
5 papers found
Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
Anh Bui, Thuy-Trang Vu, Long Vuong et al.
ICLR 2025posterarXiv:2501.18950
Information Retrieval Induced Safety Degradation in AI Agents
Cheng Yu, Benedikt Stroebl, Diyi Yang et al.
NeurIPS 2025posterarXiv:2505.14215
One Head to Rule Them All: Amplifying LVLM Safety through a Single Critical Attention Head
Junhao Xia, Haotian Zhu, Shuchao Pang et al.
NeurIPS 2025poster
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
Sanghyun Kim, Seohyeon Jung, Balhae Kim et al.
ECCV 2024posterarXiv:2407.21032
9
citations
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.
ICML 2024poster