2025 "interpretability methods" Papers
4 papers found
Concept-Guided Interpretability via Neural Chunking
Shuchen Wu, Stephan Alaniz, Shyamgopal Karthik et al.
NeurIPS 2025posterarXiv:2505.11576
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
Lorenzo Basile, Valentino Maiorca, Diego Doimo et al.
NeurIPS 2025spotlightarXiv:2510.21518
2
citations
Layer-Wise Modality Decomposition for Interpretable Multimodal Sensor Fusion
Jaehyun Park, Konyul Park, Daehun Kim et al.
NeurIPS 2025posterarXiv:2511.00859
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson, Lucy Farnik, Conor Houghton et al.
ICLR 2025posterarXiv:2409.04185
11
citations