Poster "transformer interpretability" Papers
5 papers found
Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
Areeb Ahmad, Abhinav Joshi, Ashutosh Modi
NeurIPS 2025posterarXiv:2511.20273
Pinpointing Attention-Causal Communication in Language Models
Gabriel Franco, Mark Crovella
NeurIPS 2025poster
Selective induction Heads: How Transformers Select Causal Structures in Context
Francesco D'Angelo, francesco croce, Nicolas Flammarion
ICLR 2025posterarXiv:2509.08184
4
citations
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Omer Sahin Tas, Royden Wagner
ICLR 2025posterarXiv:2406.11624
4
citations
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.
ICML 2024poster