Poster "model interpretability" Papers

21 papers found

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

Fengyuan Liu, Nikhil Kandpal, Colin Raffel

ICLR 2025posterarXiv:2411.15102
12
citations

Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning

Xueqi Ma, Jun Wang, Yanbei Jiang et al.

NeurIPS 2025posterarXiv:2512.10978
1
citations

Concept Bottleneck Language Models For Protein Design

Aya Ismail, Tuomas Oikarinen, Amy Wang et al.

ICLR 2025posterarXiv:2411.06090
13
citations

Data-centric Prediction Explanation via Kernelized Stein Discrepancy

Mahtab Sarvmaili, Hassan Sajjad, Ga Wu

ICLR 2025posterarXiv:2403.15576
2
citations

Discovering Influential Neuron Path in Vision Transformers

Yifan Wang, Yifei Liu, Yingdong Shi et al.

ICLR 2025posterarXiv:2503.09046
4
citations

LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching

Zhuo Cao, Xuan Zhao, Lena Krieger et al.

NeurIPS 2025posterarXiv:2510.14623
1
citations

Manipulating Feature Visualizations with Gradient Slingshots

Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.

NeurIPS 2025posterarXiv:2401.06122
6
citations

SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries

Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.

NeurIPS 2025posterarXiv:2410.19236
2
citations

Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations

Adrian Hill, Neal McKee, Johannes Maeß et al.

NeurIPS 2025poster

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NeurIPS 2025posterarXiv:2506.05744
13
citations

Unveiling Concept Attribution in Diffusion Models

Nguyen Hung-Quang, Hoang Phan, Khoa D Doan

NeurIPS 2025posterarXiv:2412.02542
4
citations

Attribution-based Explanations that Provide Recourse Cannot be Robust

Hidde Fokkema, Rianne de Heide, Tim van Erven

ICML 2024poster

Explaining Graph Neural Networks via Structure-aware Interaction Index

Ngoc Bui, Trung Hieu Nguyen, Viet Anh Nguyen et al.

ICML 2024poster

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yuzi Yan, Jialian Li, YipinZhang et al.

ICML 2024poster

Improving Neural Additive Models with Bayesian Principles

Kouroche Bouchiat, Alexander Immer, Hugo Yèche et al.

ICML 2024poster

Iterative Search Attribution for Deep Neural Networks

Zhiyu Zhu, Huaming Chen, Xinyi Wang et al.

ICML 2024poster

KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions

Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki et al.

ICML 2024poster

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

Yi Cai, Gerhard Wunder

ICML 2024poster

Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities

Golnoosh Farnadi, Mohammad Havaei, Negar Rostamzadeh

ICML 2024poster

Position: Stop Making Unscientific AGI Performance Claims

Patrick Altmeyer, Andrew Demetriou, Antony Bartlett et al.

ICML 2024poster

Provably Better Explanations with Optimized Aggregation of Feature Attributions

Thomas Decker, Ananta Bhattarai, Jindong Gu et al.

ICML 2024poster