Paper "mechanistic interpretability" Papers
4 papers found
Conference
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
Sewoong Lee, Adam Davies, Marc E. Canby et al.
COLM 2025paperarXiv:2503.24277
2
citations
Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
Jorge García-Carrasco, Alejandro Maté, Juan Trujillo
AAAI 2025paperarXiv:2412.15750
3
citations
Truth-value judgment in language models: ‘truth directions’ are context sensitive
Stefan F. Schouten, Peter Bloem, Ilia Markov et al.
COLM 2025paper
Visual Representations inside the Language Model
Benlin Liu, Amita Kamath, Madeleine Grunde-McLaughlin et al.
COLM 2025paper
2
citations