by Caden Juang Papers
3 papers found
Automatically Interpreting Millions of Features in Large Language Models
Gonçalo Paulo, Alex Mallen, Caden Juang et al.
ICML 2025posterarXiv:2410.13928
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
Jaden Fiotto-Kaufman, Alexander Loftus, Eric Todd et al.
ICLR 2025poster
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
Julian Minder, Clément Dumas, Caden Juang et al.
NeurIPS 2025poster