NEURIPS 2025 "transformer interpretability" Papers

5 papers found

Filters:NEURIPS 2025 transformer interpretability Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits

Areeb Ahmad, Abhinav Joshi, Ashutosh Modi

NEURIPS 2025posterarXiv:2511.20273

EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification

Lin Zhang, Wenshuo Dong, Zhuoran Zhang et al.

NEURIPS 2025posterarXiv:2502.06852

FlowPrune: Accelerating Attention Flow Calculation by Pruning Flow Network

Shuo Xu, Yu Chen, Shuxia Lin et al.

NEURIPS 2025poster

Pinpointing Attention-Causal Communication in Language Models

Gabriel Franco, Mark Crovella

NEURIPS 2025poster

The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation

Patrick Kahardipraja, Reduan Achtibat, Thomas Wiegand et al.

NEURIPS 2025posterarXiv:2505.15807