NeurIPS "model interpretability" Papers
7 papers found
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Xueqi Ma, Jun Wang, Yanbei Jiang et al.
NeurIPS 2025posterarXiv:2512.10978
1
citations
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
NeurIPS 2025arXiv:2506.02867
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching
Zhuo Cao, Xuan Zhao, Lena Krieger et al.
NeurIPS 2025posterarXiv:2510.14623
1
citations
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
NeurIPS 2025posterarXiv:2401.06122
6
citations
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.
NeurIPS 2025posterarXiv:2410.19236
2
citations
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
NeurIPS 2025posterarXiv:2506.05744
13
citations
Unveiling Concept Attribution in Diffusion Models
Nguyen Hung-Quang, Hoang Phan, Khoa D Doan
NeurIPS 2025posterarXiv:2412.02542
4
citations