Kaixuan Huang
3
Papers
156
Total Citations
Papers (3)
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025arXiv
141
citations
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
ICML 2025
15
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
0
citations