NEURIPS "model interpretability" Papers

17 papers found

Filters:NEURIPS model interpretability Clear all

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

Additive Models Explained: A Computational Complexity Approach

Shahaf Bassan, Michal Moshkovitz, Guy Katz

NEURIPS 2025posterarXiv:2510.21292

citations

Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning

Xueqi Ma, Jun Wang, Yanbei Jiang et al.

NEURIPS 2025posterarXiv:2512.10978

citations

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

George Cazenavette, Antonio Torralba, Vincent Sitzmann

NEURIPS 2025posterarXiv:2511.16674

citations

DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models

Cathy Jiao, Yijun Pan, Emily Xiao et al.

NEURIPS 2025posterarXiv:2507.09424

Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs

Yaming Yang, Ziyu Zheng, Weigang Lu et al.

NEURIPS 2025poster

Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning

Chen Qian, Dongrui Liu, Hao Wen et al.

NEURIPS 2025arXiv:2506.02867

citations

Dense SAE Latents Are Features, Not Bugs

Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.

NEURIPS 2025posterarXiv:2506.15679

citations

LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching

Zhuo Cao, Xuan Zhao, Lena Krieger et al.

NEURIPS 2025posterarXiv:2510.14623

citations

Localizing Knowledge in Diffusion Transformers

Arman Zarei, Samyadeep Basu, Keivan Rezaei et al.

NEURIPS 2025posterarXiv:2505.18832

citations

Manipulating Feature Visualizations with Gradient Slingshots

Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.

NEURIPS 2025posterarXiv:2401.06122

citations

Register and [CLS] tokens induce a decoupling of local and global features in large ViTs

Alexander Lappe, Martin Giese

NEURIPS 2025poster

citations

Self-Assembling Graph Perceptrons

Jialong Chen, Tong Wang, Bowen Deng et al.

NEURIPS 2025spotlight

SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries

Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.

NEURIPS 2025posterarXiv:2410.19236

citations

Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations

Adrian Hill, Neal McKee, Johannes Maeß et al.

NEURIPS 2025poster

The Fragile Truth of Saliency: Improving LLM Input Attribution via Attention Bias Optimization

Yihua Zhang, Changsheng Wang, Yiwei Chen et al.

NEURIPS 2025spotlight

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.

NEURIPS 2025posterarXiv:2506.05744

citations

Unveiling Concept Attribution in Diffusion Models

Nguyen Hung-Quang, Hoang Phan, Khoa D Doan

NEURIPS 2025posterarXiv:2412.02542

citations