"model interpretability" Papers
25 papers found
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Fengyuan Liu, Nikhil Kandpal, Colin Raffel
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang, Yifei Liu, Yingdong Shi et al.
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Zhuo Cao, Xuan Zhao, Lena Krieger et al.
Looking Inward: Language Models Can Learn About Themselves by Introspection
Felix Jedidja Binder, James Chua, Tomek Korbak et al.
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
Unveiling Concept Attribution in Diffusion Models
Nguyen Hung-Quang, Hoang Phan, Khoa D Doan
Accelerating the Global Aggregation of Local Explanations
Alon Mor, Yonatan Belinkov, Benny Kimelfeld
Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Saebom Leem, Hyunseok Seo
Attribution-based Explanations that Provide Recourse Cannot be Robust
Hidde Fokkema, Rianne de Heide, Tim van Erven
Explaining Graph Neural Networks via Structure-aware Interaction Index
Ngoc Bui, Trung Hieu Nguyen, Viet Anh Nguyen et al.
Explaining Probabilistic Models with Distributional Values
Luca Franceschi, Michele Donini, Cedric Archambeau et al.
Exploring the LLM Journey from Cognition to Expression with Linear Representations
Yuzi Yan, Jialian Li, YipinZhang et al.
Improving Neural Additive Models with Bayesian Principles
Kouroche Bouchiat, Alexander Immer, Hugo Yèche et al.
Iterative Search Attribution for Deep Neural Networks
Zhiyu Zhu, Huaming Chen, Xinyi Wang et al.
KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions
Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki et al.
MAPTree: Beating “Optimal” Decision Trees with Bayesian Decision Trees
Colin Sullivan, Mo Tiwari, Sebastian Thrun
MFABA: A More Faithful and Accelerated Boundary-Based Attribution Method for Deep Neural Networks
Zhiyu Zhu, Huaming Chen, Jiayu Zhang et al.
On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
Yi Cai, Gerhard Wunder
Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
Golnoosh Farnadi, Mohammad Havaei, Negar Rostamzadeh
Position: Stop Making Unscientific AGI Performance Claims
Patrick Altmeyer, Andrew Demetriou, Antony Bartlett et al.
Provably Better Explanations with Optimized Aggregation of Feature Attributions
Thomas Decker, Ananta Bhattarai, Jindong Gu et al.
Towards Modeling Uncertainties of Self-explaining Neural Networks via Conformal Prediction
Wei Qian, Chenxu Zhao, Yangyi Li et al.