Yangsibo Huang

11

Papers

750

Total Citations

Papers (11)

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Scaling Laws for Differentially Private Language Models

Scaling Embedding Layers in Language Models

NeurIPS 2025arXiv

Position: A Safe Harbor for AI Evaluation and Red Teaming

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

NeurIPS 2021arXiv

Recovering Private Text in Federated Learning of Language Models

NeurIPS 2022arXiv

Sparsity-Preserving Differentially Private Training of Large Embedding Models

NeurIPS 2023arXiv