2025 Poster "model interpretability" Papers
16 papers found
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Fengyuan Liu, Nikhil Kandpal, Colin Raffel
ICLR 2025posterarXiv:2411.15102
12
citations
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Xueqi Ma, Jun Wang, Yanbei Jiang et al.
NeurIPS 2025posterarXiv:2512.10978
1
citations
Concept Bottleneck Language Models For Protein Design
Aya Ismail, Tuomas Oikarinen, Amy Wang et al.
ICLR 2025posterarXiv:2411.06090
13
citations
Data-centric Prediction Explanation via Kernelized Stein Discrepancy
Mahtab Sarvmaili, Hassan Sajjad, Ga Wu
ICLR 2025posterarXiv:2403.15576
2
citations
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
George Cazenavette, Antonio Torralba, Vincent Sitzmann
NeurIPS 2025posterarXiv:2511.16674
Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs
Yaming Yang, Ziyu Zheng, Weigang Lu et al.
NeurIPS 2025poster
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang, Yifei Liu, Yingdong Shi et al.
ICLR 2025posterarXiv:2503.09046
4
citations
From Search to Sampling: Generative Models for Robust Algorithmic Recourse
Prateek Garg, Lokesh Nagalapatti, Sunita Sarawagi
ICLR 2025posterarXiv:2505.07351
2
citations
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Zhuo Cao, Xuan Zhao, Lena Krieger et al.
NeurIPS 2025posterarXiv:2510.14623
1
citations
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
NeurIPS 2025posterarXiv:2401.06122
6
citations
Register and [CLS] tokens induce a decoupling of local and global features in large ViTs
Alexander Lappe, Martin Giese
NeurIPS 2025poster
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.
NeurIPS 2025posterarXiv:2410.19236
2
citations
Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations
Adrian Hill, Neal McKee, Johannes Maeß et al.
NeurIPS 2025poster
Start Smart: Leveraging Gradients For Enhancing Mask-based XAI Methods
Buelent Uendes, Shujian Yu, Mark Hoogendoorn
ICLR 2025poster
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
NeurIPS 2025posterarXiv:2506.05744
13
citations
Unveiling Concept Attribution in Diffusion Models
Nguyen Hung-Quang, Hoang Phan, Khoa D Doan
NeurIPS 2025posterarXiv:2412.02542
4
citations