Yangsibo Huang
11
Papers
750
Total Citations
Papers (11)
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
ICLR 2024arXiv
412
citations
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
ICLR 2025arXiv
157
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025arXiv
141
citations
Fantastic Copyrighted Beasts and How (Not) to Generate Them
ICLR 2025arXiv
23
citations
Scaling Laws for Differentially Private Language Models
ICML 2025arXiv
9
citations
Scaling Embedding Layers in Language Models
NeurIPS 2025arXiv
8
citations
Position: A Safe Harbor for AI Evaluation and Red Teaming
ICML 2024
0
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024arXiv
0
citations
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
NeurIPS 2021arXiv
0
citations
Recovering Private Text in Federated Learning of Language Models
NeurIPS 2022arXiv
0
citations
Sparsity-Preserving Differentially Private Training of Large Embedding Models
NeurIPS 2023arXiv
0
citations