"model interpretability" Papers

25 papers found

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

Fengyuan Liu, Nikhil Kandpal, Colin Raffel

ICLR 2025posterarXiv:2411.15102
12
citations

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning

NeurIPS 2025arXiv:2506.02867

Discovering Influential Neuron Path in Vision Transformers

Yifan Wang, Yifei Liu, Yingdong Shi et al.

ICLR 2025posterarXiv:2503.09046
4
citations

LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching

Zhuo Cao, Xuan Zhao, Lena Krieger et al.

NeurIPS 2025posterarXiv:2510.14623
1
citations

Looking Inward: Language Models Can Learn About Themselves by Introspection

Felix Jedidja Binder, James Chua, Tomek Korbak et al.

ICLR 2025oralarXiv:2410.13787
40
citations

Manipulating Feature Visualizations with Gradient Slingshots

Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.

NeurIPS 2025posterarXiv:2401.06122
6
citations

SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries

Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.

NeurIPS 2025posterarXiv:2410.19236
2
citations

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NeurIPS 2025posterarXiv:2506.05744
13
citations

Unveiling Concept Attribution in Diffusion Models

Nguyen Hung-Quang, Hoang Phan, Khoa D Doan

NeurIPS 2025posterarXiv:2412.02542
4
citations

Accelerating the Global Aggregation of Local Explanations

Alon Mor, Yonatan Belinkov, Benny Kimelfeld

AAAI 2024paperarXiv:2312.07991
6
citations

Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention

Saebom Leem, Hyunseok Seo

AAAI 2024paperarXiv:2402.04563
31
citations

Attribution-based Explanations that Provide Recourse Cannot be Robust

Hidde Fokkema, Rianne de Heide, Tim van Erven

ICML 2024poster

Explaining Graph Neural Networks via Structure-aware Interaction Index

Ngoc Bui, Trung Hieu Nguyen, Viet Anh Nguyen et al.

ICML 2024poster

Explaining Probabilistic Models with Distributional Values

Luca Franceschi, Michele Donini, Cedric Archambeau et al.

ICML 2024spotlight

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yuzi Yan, Jialian Li, YipinZhang et al.

ICML 2024poster

Improving Neural Additive Models with Bayesian Principles

Kouroche Bouchiat, Alexander Immer, Hugo Yèche et al.

ICML 2024poster

Iterative Search Attribution for Deep Neural Networks

Zhiyu Zhu, Huaming Chen, Xinyi Wang et al.

ICML 2024poster

KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions

Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki et al.

ICML 2024poster

MAPTree: Beating “Optimal” Decision Trees with Bayesian Decision Trees

Colin Sullivan, Mo Tiwari, Sebastian Thrun

AAAI 2024paperarXiv:2309.15312

MFABA: A More Faithful and Accelerated Boundary-Based Attribution Method for Deep Neural Networks

Zhiyu Zhu, Huaming Chen, Jiayu Zhang et al.

AAAI 2024paperarXiv:2312.13630
14
citations

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

Yi Cai, Gerhard Wunder

ICML 2024poster

Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities

Golnoosh Farnadi, Mohammad Havaei, Negar Rostamzadeh

ICML 2024poster

Position: Stop Making Unscientific AGI Performance Claims

Patrick Altmeyer, Andrew Demetriou, Antony Bartlett et al.

ICML 2024poster

Provably Better Explanations with Optimized Aggregation of Feature Attributions

Thomas Decker, Ananta Bhattarai, Jindong Gu et al.

ICML 2024poster

Towards Modeling Uncertainties of Self-explaining Neural Networks via Conformal Prediction

Wei Qian, Chenxu Zhao, Yangyi Li et al.

AAAI 2024paperarXiv:2401.01549
10
citations