ICLR 2025 "sparse autoencoders" Papers
5 papers found
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.
ICLR 2025posterarXiv:2411.14257
77
citations
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Gouki Gouki, Hiroki Furuta, Yusuke Iwasawa et al.
ICLR 2025posterarXiv:2501.06254
9
citations
Scaling and evaluating sparse autoencoders
Leo Gao, Tom Dupre la Tour, Henk Tillman et al.
ICLR 2025posterarXiv:2406.04093
298
citations
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov, Georg Lange, Neel Nanda
ICLR 2025posterarXiv:2405.08366
63
citations
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Omer Sahin Tas, Royden Wagner
ICLR 2025posterarXiv:2406.11624
4
citations