NEURIPS Poster "model interpretability" Papers

14 papers found

Additive Models Explained: A Computational Complexity Approach

Shahaf Bassan, Michal Moshkovitz, Guy Katz

NEURIPS 2025posterarXiv:2510.21292
1
citations

Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning

Xueqi Ma, Jun Wang, Yanbei Jiang et al.

NEURIPS 2025posterarXiv:2512.10978
2
citations

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

George Cazenavette, Antonio Torralba, Vincent Sitzmann

NEURIPS 2025posterarXiv:2511.16674
1
citations

DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models

Cathy Jiao, Yijun Pan, Emily Xiao et al.

NEURIPS 2025posterarXiv:2507.09424

Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs

Yaming Yang, Ziyu Zheng, Weigang Lu et al.

NEURIPS 2025poster

Dense SAE Latents Are Features, Not Bugs

Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.

NEURIPS 2025posterarXiv:2506.15679
6
citations

LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching

Zhuo Cao, Xuan Zhao, Lena Krieger et al.

NEURIPS 2025posterarXiv:2510.14623
1
citations

Localizing Knowledge in Diffusion Transformers

Arman Zarei, Samyadeep Basu, Keivan Rezaei et al.

NEURIPS 2025posterarXiv:2505.18832
1
citations

Manipulating Feature Visualizations with Gradient Slingshots

Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.

NEURIPS 2025posterarXiv:2401.06122
6
citations

Register and [CLS] tokens induce a decoupling of local and global features in large ViTs

Alexander Lappe, Martin Giese

NEURIPS 2025poster
3
citations

SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries

Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.

NEURIPS 2025posterarXiv:2410.19236
2
citations

Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations

Adrian Hill, Neal McKee, Johannes Maeß et al.

NEURIPS 2025poster

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NEURIPS 2025posterarXiv:2506.05744
13
citations

Unveiling Concept Attribution in Diffusion Models

Nguyen Hung-Quang, Hoang Phan, Khoa D Doan

NEURIPS 2025posterarXiv:2412.02542
4
citations