"interpretability" Papers
3 papers found
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov, Georg Lange, Neel Nanda
ICLR 2025posterarXiv:2405.08366
63
citations
CF-OPT: Counterfactual Explanations for Structured Prediction
Germain Vivier-Ardisson, Alexandre Forel, Axel Parmentier et al.
ICML 2024poster
Revisiting Document-Level Relation Extraction with Context-Guided Link Prediction
Monika Jain, Raghava Mutharaju, Ramakanth Kavuluru et al.
AAAI 2024paperarXiv:2401.11800
17
citations