NEURIPS 2025 "interpretable features" Papers
2 papers found
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.
NEURIPS 2025posterarXiv:2506.15679
6
citations
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.
NEURIPS 2025posterarXiv:2506.03093
10
citations