Neel Nanda
4
Papers
191
Total Citations
Papers (4)
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
ICLR 2025arXiv
77
citations
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
ICLR 2025arXiv
63
citations
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
ICML 2025
51
citations
Explorations of Self-Repair in Language Models
ICML 2024
0
citations