Poster "model faithfulness" Papers
2 papers found
Interpretability Illusions in the Generalization of Simplified Models
Dan Friedman, Andrew Lampinen, Lucas Dixon et al.
ICML 2024posterarXiv:2312.03656
Saliency strikes back: How filtering out high frequencies improves white-box explanations
Sabine Muzellec, Thomas FEL, Victor Boutin et al.
ICML 2024posterarXiv:2307.09591