Martin Wattenberg
4
Papers
51
Total Citations
Papers (4)
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
ICML 2025
28
citations
ICLR: In-Context Learning of Representations
ICLR 2025
23
citations
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
ICML 2024
0
citations
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
ICML 2024
0
citations