Anca Dragan

25
Papers
84
Total Citations

Papers (25)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

ICLR 2025
41
citations

Learning Optimal Advantage from Preferences and Mistaking It for Reward

AAAI 2024arXiv
15
citations

Context Steering: Controllable Personalization at Inference Time

ICLR 2025
11
citations

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

ICLR 2025
8
citations

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

ICLR 2024
5
citations

AssistanceZero: Scalably Solving Assistance Games

ICML 2025
4
citations

Learning to Model the World With Language

ICML 2024
0
citations

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

ICML 2024
0
citations

Cooperative Inverse Reinforcement Learning

NeurIPS 2016arXiv
0
citations

Inverse Reward Design

NeurIPS 2017arXiv
0
citations

AI Alignment with Changing and Influenceable Reward Functions

ICML 2024
0
citations

Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation

ICML 2024
0
citations

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference

ICML 2019
0
citations

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning

ICML 2019
0
citations

Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior

NeurIPS 2018
0
citations

On the Utility of Learning about Humans for Human-AI Coordination

NeurIPS 2019
0
citations

Reward-rational (implicit) choice: A unifying formalism for reward learning

NeurIPS 2020
0
citations

AvE: Assistance via Empowerment

NeurIPS 2020
0
citations

Preference learning along multiple criteria: A game-theoretic perspective

NeurIPS 2020
0
citations

Pragmatic Image Compression for Human-in-the-Loop Decision-Making

NeurIPS 2021
0
citations

First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization

NeurIPS 2022
0
citations

Uni[MASK]: Unified Inference in Sequential Decision Problems

NeurIPS 2022
0
citations

Learning to Influence Human Behavior with Offline Reinforcement Learning

NeurIPS 2023
0
citations

Bridging RL Theory and Practice with the Effective Horizon

NeurIPS 2023
0
citations

An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

ICML 2018
0
citations