ICLR 2025 "model interpretability" Papers
6 papers found
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Fengyuan Liu, Nikhil Kandpal, Colin Raffel
ICLR 2025posterarXiv:2411.15102
12
citations
Concept Bottleneck Language Models For Protein Design
Aya Ismail, Tuomas Oikarinen, Amy Wang et al.
ICLR 2025posterarXiv:2411.06090
13
citations
Data-centric Prediction Explanation via Kernelized Stein Discrepancy
Mahtab Sarvmaili, Hassan Sajjad, Ga Wu
ICLR 2025posterarXiv:2403.15576
2
citations
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang, Yifei Liu, Yingdong Shi et al.
ICLR 2025posterarXiv:2503.09046
4
citations
From Search to Sampling: Generative Models for Robust Algorithmic Recourse
Prateek Garg, Lokesh Nagalapatti, Sunita Sarawagi
ICLR 2025posterarXiv:2505.07351
2
citations
Looking Inward: Language Models Can Learn About Themselves by Introspection
Felix Jedidja Binder, James Chua, Tomek Korbak et al.
ICLR 2025oralarXiv:2410.13787
40
citations