Anca Dragan
10
Papers
84
Total Citations
Papers (10)
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
ICLR 2025
41
citations
Learning Optimal Advantage from Preferences and Mistaking It for Reward
AAAI 2024arXiv
15
citations
Context Steering: Controllable Personalization at Inference Time
ICLR 2025
11
citations
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
ICLR 2025
8
citations
The Effective Horizon Explains Deep RL Performance in Stochastic Environments
ICLR 2024
5
citations
AssistanceZero: Scalably Solving Assistance Games
ICML 2025
4
citations
Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation
ICML 2024
0
citations
AI Alignment with Changing and Influenceable Reward Functions
ICML 2024
0
citations
Learning to Model the World With Language
ICML 2024
0
citations
Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making
ICML 2024
0
citations