Long Phan
4
Papers
139
Total Citations
Papers (4)
Tamper-Resistant Safeguards for Open-Weight LLMs
ICLR 2025arXiv
108
citations
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
NeurIPS 2025
31
citations
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
ICML 2024
0
citations
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
NeurIPS 2022
0
citations