David Krueger

6

Papers

30

Total Citations

Papers (6)

Pitfalls of Evidence-Based AI Policy

Detecting High-Stakes Interactions with Activation Probes

From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization

Input Space Mode Connectivity in Deep Neural Networks

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

Implicit meta-learning may lead language models to trust more reliable sources