NEURIPS Poster "model interpretability" Papers
14 papers found
Additive Models Explained: A Computational Complexity Approach
Shahaf Bassan, Michal Moshkovitz, Guy Katz
NEURIPS 2025posterarXiv:2510.21292
1
citations
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Xueqi Ma, Jun Wang, Yanbei Jiang et al.
NEURIPS 2025posterarXiv:2512.10978
2
citations
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
George Cazenavette, Antonio Torralba, Vincent Sitzmann
NEURIPS 2025posterarXiv:2511.16674
1
citations
DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models
Cathy Jiao, Yijun Pan, Emily Xiao et al.
NEURIPS 2025posterarXiv:2507.09424
Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs
Yaming Yang, Ziyu Zheng, Weigang Lu et al.
NEURIPS 2025poster
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.
NEURIPS 2025posterarXiv:2506.15679
6
citations
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Zhuo Cao, Xuan Zhao, Lena Krieger et al.
NEURIPS 2025posterarXiv:2510.14623
1
citations
Localizing Knowledge in Diffusion Transformers
Arman Zarei, Samyadeep Basu, Keivan Rezaei et al.
NEURIPS 2025posterarXiv:2505.18832
1
citations
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
NEURIPS 2025posterarXiv:2401.06122
6
citations
Register and [CLS] tokens induce a decoupling of local and global features in large ViTs
Alexander Lappe, Martin Giese
NEURIPS 2025poster
3
citations
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.
NEURIPS 2025posterarXiv:2410.19236
2
citations
Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations
Adrian Hill, Neal McKee, Johannes Maeß et al.
NEURIPS 2025poster
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
NEURIPS 2025posterarXiv:2506.05744
13
citations
Unveiling Concept Attribution in Diffusion Models
Nguyen Hung-Quang, Hoang Phan, Khoa D Doan
NEURIPS 2025posterarXiv:2412.02542
4
citations