NEURIPS 2025 "transformer interpretability" Papers
5 papers found
Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
Areeb Ahmad, Abhinav Joshi, Ashutosh Modi
NEURIPS 2025posterarXiv:2511.20273
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Lin Zhang, Wenshuo Dong, Zhuoran Zhang et al.
NEURIPS 2025posterarXiv:2502.06852
9
citations
FlowPrune: Accelerating Attention Flow Calculation by Pruning Flow Network
Shuo Xu, Yu Chen, Shuxia Lin et al.
NEURIPS 2025poster
Pinpointing Attention-Causal Communication in Language Models
Gabriel Franco, Mark Crovella
NEURIPS 2025poster
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation
Patrick Kahardipraja, Reduan Achtibat, Thomas Wiegand et al.
NEURIPS 2025posterarXiv:2505.15807
4
citations