"red teaming" Papers
3 papers found
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
Ruofan Wang, Juncheng Li, Yixu Wang et al.
ICCV 2025posterarXiv:2411.00827
8
citations
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Xiaojun Jia, Tianyu Pang, Chao Du et al.
ICLR 2025posterarXiv:2405.21018
74
citations
Position: A Safe Harbor for AI Evaluation and Red Teaming
Shayne Longpre, Sayash Kapoor, Kevin Klyman et al.
ICML 2024poster