Poster "interpretability" Papers
3 papers found
Conference
Enhancing Uncertainty Estimation and Interpretability with Bayesian Non-negative Decision Layer
XINYUE HU, Zhibin Duan, Bo Chen et al.
ICLR 2025arXiv:2505.22199
3
citations
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov, Georg Lange, Neel Nanda
ICLR 2025arXiv:2405.08366
65
citations
CF-OPT: Counterfactual Explanations for Structured Prediction
Germain Vivier-Ardisson, Alexandre Forel, Axel Parmentier et al.
ICML 2024arXiv:2405.18293
3
citations