Yangsibo Huang
7
Papers
727
Total Citations
Papers (7)
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
ICLR 2024
412
citations
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
ICLR 2025arXiv
157
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025arXiv
141
citations
Scaling Laws for Differentially Private Language Models
ICML 2025
9
citations
Scaling Embedding Layers in Language Models
NeurIPS 2025
8
citations
Position: A Safe Harbor for AI Evaluation and Red Teaming
ICML 2024
0
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
0
citations