NeurIPS 2025 "transformer interpretability" Papers
3 papers found
Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
Areeb Ahmad, Abhinav Joshi, Ashutosh Modi
NeurIPS 2025posterarXiv:2511.20273
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Lin Zhang, Wenshuo Dong, Zhuoran Zhang et al.
NeurIPS 2025posterarXiv:2502.06852
9
citations
Pinpointing Attention-Causal Communication in Language Models
Gabriel Franco, Mark Crovella
NeurIPS 2025poster