NeurIPS "sparse autoencoders" Papers
5 papers found
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
NeurIPS 2025spotlightarXiv:2504.04072
7
citations
Large Language Models Think Too Fast To Explore Effectively
Lan Pan, Hanbo Xie, Robert Wilson
NeurIPS 2025posterarXiv:2501.18009
6
citations
Representation Consistency for Accurate and Coherent LLM Answer Aggregation
Junqi Jiang, Tom Bewley, Salim I. Amoukou et al.
NeurIPS 2025posterarXiv:2506.21590
2
citations
Revising and Falsifying Sparse Autoencoder Feature Explanations
George Ma, Samuel Pfrommer, Somayeh Sojoudi
NeurIPS 2025poster
SparseMVC: Probing Cross-view Sparsity Variations for Multi-view Clustering
Ruimeng Liu, Xin Zou, Chang Tang et al.
NeurIPS 2025spotlight