Poster "robustness improvement" Papers
3 papers found
Robust LLM safeguarding via refusal feature adversarial training
Lei Yu, Virginie Do, Karen Hambardzumyan et al.
ICLR 2025posterarXiv:2409.20089
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Bhavya Vasudeva, Deqing Fu, Tianyi Zhou et al.
ICLR 2025posterarXiv:2403.06925
7
citations
PIDformer: Transformer Meets Control Theory
Tam Nguyen, Cesar Uribe, Tan Nguyen et al.
ICML 2024poster