Anca Dragan

10

Papers

84

Total Citations

Papers (10)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Learning Optimal Advantage from Preferences and Mistaking It for Reward

Context Steering: Controllable Personalization at Inference Time

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

AssistanceZero: Scalably Solving Assistance Games

Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation

AI Alignment with Changing and Influenceable Reward Functions

Learning to Model the World With Language

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making