Mengzhou Xia
6
Papers
486
Total Citations
Papers (6)
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
ICLR 2024
412
citations
The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning
NeurIPS 2025arXiv
74
citations
Trainable Transformer in Transformer
ICML 2024
0
citations
LESS: Selecting Influential Data for Targeted Instruction Tuning
ICML 2024
0
citations
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
0
citations
Language Models as Science Tutors
ICML 2024
0
citations