Yangsibo Huang

10

Papers

727

Total Citations

Papers (10)

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Scaling Laws for Differentially Private Language Models

Scaling Embedding Layers in Language Models

Position: A Safe Harbor for AI Evaluation and Red Teaming

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Evaluating Gradient Inversion Attacks and Defenses in Federated Learning

Recovering Private Text in Federated Learning of Language Models

Sparsity-Preserving Differentially Private Training of Large Embedding Models