Yangsibo Huang
10
Papers
727
Total Citations
Papers (10)
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
ICLR 2024
412
citations
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
ICLR 2025arXiv
157
citations
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025arXiv
141
citations
Scaling Laws for Differentially Private Language Models
ICML 2025
9
citations
Scaling Embedding Layers in Language Models
NeurIPS 2025
8
citations
Position: A Safe Harbor for AI Evaluation and Red Teaming
ICML 2024
0
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
0
citations
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
NeurIPS 2021
0
citations
Recovering Private Text in Federated Learning of Language Models
NeurIPS 2022
0
citations
Sparsity-Preserving Differentially Private Training of Large Embedding Models
NeurIPS 2023
0
citations