ICLR Poster "interpretability" Papers
2 papers found
Enhancing Uncertainty Estimation and Interpretability with Bayesian Non-negative Decision Layer
XINYUE HU, Zhibin Duan, Bo Chen et al.
ICLR 2025posterarXiv:2505.22199
2
citations
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov, Georg Lange, Neel Nanda
ICLR 2025posterarXiv:2405.08366
63
citations