2025 Poster "model interpretability" Papers

16 papers found

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

Fengyuan Liu, Nikhil Kandpal, Colin Raffel

ICLR 2025posterarXiv:2411.15102
12
citations

Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning

Xueqi Ma, Jun Wang, Yanbei Jiang et al.

NeurIPS 2025posterarXiv:2512.10978
1
citations

Concept Bottleneck Language Models For Protein Design

Aya Ismail, Tuomas Oikarinen, Amy Wang et al.

ICLR 2025posterarXiv:2411.06090
13
citations

Data-centric Prediction Explanation via Kernelized Stein Discrepancy

Mahtab Sarvmaili, Hassan Sajjad, Ga Wu

ICLR 2025posterarXiv:2403.15576
2
citations

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

George Cazenavette, Antonio Torralba, Vincent Sitzmann

NeurIPS 2025posterarXiv:2511.16674

Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs

Yaming Yang, Ziyu Zheng, Weigang Lu et al.

NeurIPS 2025poster

Discovering Influential Neuron Path in Vision Transformers

Yifan Wang, Yifei Liu, Yingdong Shi et al.

ICLR 2025posterarXiv:2503.09046
4
citations

From Search to Sampling: Generative Models for Robust Algorithmic Recourse

Prateek Garg, Lokesh Nagalapatti, Sunita Sarawagi

ICLR 2025posterarXiv:2505.07351
2
citations

LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching

Zhuo Cao, Xuan Zhao, Lena Krieger et al.

NeurIPS 2025posterarXiv:2510.14623
1
citations

Manipulating Feature Visualizations with Gradient Slingshots

Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.

NeurIPS 2025posterarXiv:2401.06122
6
citations

Register and [CLS] tokens induce a decoupling of local and global features in large ViTs

Alexander Lappe, Martin Giese

NeurIPS 2025poster

SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries

Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.

NeurIPS 2025posterarXiv:2410.19236
2
citations

Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations

Adrian Hill, Neal McKee, Johannes Maeß et al.

NeurIPS 2025poster

Start Smart: Leveraging Gradients For Enhancing Mask-based XAI Methods

Buelent Uendes, Shujian Yu, Mark Hoogendoorn

ICLR 2025poster

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NeurIPS 2025posterarXiv:2506.05744
13
citations

Unveiling Concept Attribution in Diffusion Models

Nguyen Hung-Quang, Hoang Phan, Khoa D Doan

NeurIPS 2025posterarXiv:2412.02542
4
citations