NeurIPS Poster "model interpretability" Papers
9 papers found
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Xueqi Ma, Jun Wang, Yanbei Jiang et al.
NeurIPS 2025posterarXiv:2512.10978
1
citations
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
George Cazenavette, Antonio Torralba, Vincent Sitzmann
NeurIPS 2025posterarXiv:2511.16674
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Zhuo Cao, Xuan Zhao, Lena Krieger et al.
NeurIPS 2025posterarXiv:2510.14623
1
citations
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
NeurIPS 2025posterarXiv:2401.06122
6
citations
Register and [CLS] tokens induce a decoupling of local and global features in large ViTs
Alexander Lappe, Martin Giese
NeurIPS 2025poster
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.
NeurIPS 2025posterarXiv:2410.19236
2
citations
Smoothed Differentiation Efficiently Mitigates Shattered Gradients in Explanations
Adrian Hill, Neal McKee, Johannes Maeß et al.
NeurIPS 2025poster
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
NeurIPS 2025posterarXiv:2506.05744
13
citations
Unveiling Concept Attribution in Diffusion Models
Nguyen Hung-Quang, Hoang Phan, Khoa D Doan
NeurIPS 2025posterarXiv:2412.02542
4
citations