Poster "model interpretability" Papers
21 papers found
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Fengyuan Liu, Nikhil Kandpal, Colin Raffel
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Xueqi Ma, Jun Wang, Yanbei Jiang et al.
Concept Bottleneck Language Models For Protein Design
Aya Ismail, Tuomas Oikarinen, Amy Wang et al.
Data-centric Prediction Explanation via Kernelized Stein Discrepancy
Mahtab Sarvmaili, Hassan Sajjad, Ga Wu
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang, Yifei Liu, Yingdong Shi et al.
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Zhuo Cao, Xuan Zhao, Lena Krieger et al.
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.
Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations
Adrian Hill, Neal McKee, Johannes Maeß et al.
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
Unveiling Concept Attribution in Diffusion Models
Nguyen Hung-Quang, Hoang Phan, Khoa D Doan
Attribution-based Explanations that Provide Recourse Cannot be Robust
Hidde Fokkema, Rianne de Heide, Tim van Erven
Explaining Graph Neural Networks via Structure-aware Interaction Index
Ngoc Bui, Trung Hieu Nguyen, Viet Anh Nguyen et al.
Exploring the LLM Journey from Cognition to Expression with Linear Representations
Yuzi Yan, Jialian Li, YipinZhang et al.
Improving Neural Additive Models with Bayesian Principles
Kouroche Bouchiat, Alexander Immer, Hugo Yèche et al.
Iterative Search Attribution for Deep Neural Networks
Zhiyu Zhu, Huaming Chen, Xinyi Wang et al.
KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions
Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki et al.
On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
Yi Cai, Gerhard Wunder
Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
Golnoosh Farnadi, Mohammad Havaei, Negar Rostamzadeh
Position: Stop Making Unscientific AGI Performance Claims
Patrick Altmeyer, Andrew Demetriou, Antony Bartlett et al.
Provably Better Explanations with Optimized Aggregation of Feature Attributions
Thomas Decker, Ananta Bhattarai, Jindong Gu et al.