Neel Nanda

4

Papers

191

Total Citations

Papers (4)

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Explorations of Self-Repair in Language Models