David Krueger
6
Papers
30
Total Citations
Papers (6)
Pitfalls of Evidence-Based AI Policy
ICLR 2025arXiv
14
citations
Detecting High-Stakes Interactions with Activation Probes
NeurIPS 2025
13
citations
From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization
NeurIPS 2025
1
citations
Input Space Mode Connectivity in Deep Neural Networks
ICLR 2025
1
citations
PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data
ICML 2025
1
citations
Implicit meta-learning may lead language models to trust more reliable sources
ICML 2024
0
citations