NEURIPS 2025 "model interpretability" Papers
17 papers found
Additive Models Explained: A Computational Complexity Approach
Shahaf Bassan, Michal Moshkovitz, Guy Katz
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Xueqi Ma, Jun Wang, Yanbei Jiang et al.
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
George Cazenavette, Antonio Torralba, Vincent Sitzmann
DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models
Cathy Jiao, Yijun Pan, Emily Xiao et al.
Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs
Yaming Yang, Ziyu Zheng, Weigang Lu et al.
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Chen Qian, Dongrui Liu, Hao Wen et al.
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Zhuo Cao, Xuan Zhao, Lena Krieger et al.
Localizing Knowledge in Diffusion Transformers
Arman Zarei, Samyadeep Basu, Keivan Rezaei et al.
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
Register and [CLS] tokens induce a decoupling of local and global features in large ViTs
Alexander Lappe, Martin Giese
Self-Assembling Graph Perceptrons
Jialong Chen, Tong Wang, Bowen Deng et al.
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.
Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations
Adrian Hill, Neal McKee, Johannes Maeß et al.
The Fragile Truth of Saliency: Improving LLM Input Attribution via Attention Bias Optimization
Yihua Zhang, Changsheng Wang, Yiwei Chen et al.
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
Unveiling Concept Attribution in Diffusion Models
Nguyen Hung-Quang, Hoang Phan, Khoa D Doan