Sergey Levine

122
Papers
3,515
Total Citations

Papers (122)

Unsupervised Learning for Physical Interaction through Video Prediction

NeurIPS 2016arXiv
1,089
citations

Value Iteration Networks

NeurIPS 2016arXiv
675
citations

Learning to Poke by Poking: Experiential Learning of Intuitive Physics

NeurIPS 2016arXiv
595
citations

Backprop KF: Learning Discriminative Deterministic State Estimators

NeurIPS 2016arXiv
213
citations

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

NeurIPS 2017arXiv
172
citations

EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

NeurIPS 2017arXiv
160
citations

Guided Policy Search via Approximate Mirror Descent

NeurIPS 2016
127
citations

OGBench: Benchmarking Offline Goal-Conditioned RL

ICLR 2025arXiv
74
citations

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

ICLR 2024
68
citations

Scaling Test-Time Compute Without Verification or RL is Suboptimal

ICML 2025
68
citations

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

ICML 2025
63
citations

Flow Q-Learning

ICML 2025
52
citations

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

NeurIPS 2025arXiv
46
citations

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

ICLR 2025
40
citations

RLIF: Interactive Imitation Learning as Reinforcement Learning

ICLR 2024
25
citations

Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design

ICML 2025
12
citations

Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

ICLR 2024
11
citations

Prioritized Generative Replay

ICLR 2025
9
citations

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

ICLR 2025
8
citations

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

ICML 2025
7
citations

Behavioral Exploration: Learning to Explore via In-Context Adaptation

ICML 2025
1
citations

Recurrent Network Models for Human Dynamics

ICCV 2015
0
citations

GPLAC: Generalizing Vision-Based Robotic Skills Using Weakly Labeled Images

ICCV 2017arXiv
0
citations

PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings

ICCV 2019
0
citations

Learning Predictive Models from Observation and Interaction

ECCV 2020
0
citations

Feedback Efficient Online Fine-Tuning of Diffusion Models

ICML 2024
0
citations

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

ICML 2024
0
citations

Foundation Policies with Hilbert Representations

ICML 2024
0
citations

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

ICML 2024
0
citations

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

ICML 2024
0
citations

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

ICML 2024
0
citations

Prompting is a Double-Edged Sword: Improving Worst-Group Robustness of Foundation Models

ICML 2024
0
citations

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

ICML 2024
0
citations

Learning to Explore in POMDPs with Informational Rewards

ICML 2024
0
citations

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

ICML 2024
0
citations

Cognitive Mapping and Planning for Visual Navigation

CVPR 2017arXiv
0
citations

Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control

CVPR 2018
0
citations

Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks

CVPR 2019
0
citations

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real

CVPR 2020
0
citations

Autonomous Reinforcement Learning via Subgoal Curricula

NeurIPS 2021
0
citations

Adaptive Risk Minimization: Learning to Adapt to Domain Shift

NeurIPS 2021
0
citations

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

NeurIPS 2021
0
citations

Which Mutual-Information Representation Learning Objectives are Sufficient for Control?

NeurIPS 2021
0
citations

Pragmatic Image Compression for Human-in-the-Loop Decision-Making

NeurIPS 2021
0
citations

Robust Predictable Control

NeurIPS 2021
0
citations

COMBO: Conservative Offline Model-Based Policy Optimization

NeurIPS 2021
0
citations

Imitating Past Successes can be Very Suboptimal

NeurIPS 2022
0
citations

Data-Driven Offline Decision-Making via Invariant Representation Learning

NeurIPS 2022
0
citations

You Only Live Once: Single-Life Reinforcement Learning

NeurIPS 2022
0
citations

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

NeurIPS 2022
0
citations

Adversarial Unlearning: Reducing Confidence Along Adversarial Directions

NeurIPS 2022
0
citations

Mismatched No More: Joint Model-Policy Optimization for Model-Based RL

NeurIPS 2022
0
citations

Distributionally Adaptive Meta Reinforcement Learning

NeurIPS 2022
0
citations

First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization

NeurIPS 2022
0
citations

Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation

NeurIPS 2022
0
citations

Contrastive Learning as Goal-Conditioned Reinforcement Learning

NeurIPS 2022
0
citations

MEMO: Test Time Robustness via Adaptation and Augmentation

NeurIPS 2022
0
citations

DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning

NeurIPS 2022
0
citations

ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints

NeurIPS 2023
0
citations

HIQL: Offline Goal-Conditioned RL with Latent States as Actions

NeurIPS 2023
0
citations

Learning to Influence Human Behavior with Offline Reinforcement Learning

NeurIPS 2023
0
citations

Ignorance is Bliss: Robust Control via Information Gating

NeurIPS 2023
0
citations

Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents

NeurIPS 2023
0
citations

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

NeurIPS 2023
0
citations

Accelerating Exploration with Unlabeled Prior Data

NeurIPS 2023
0
citations

Trust Region Policy Optimization

ICML 2015
0
citations

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

ICML 2016
0
citations

Continuous Deep Q-Learning with Model-based Acceleration

ICML 2016
0
citations

Modular Multitask Reinforcement Learning with Policy Sketches

ICML 2017
0
citations

Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning

ICML 2017
0
citations

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

ICML 2017
0
citations

Reinforcement Learning with Deep Energy-Based Policies

ICML 2017
0
citations

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

ICML 2018
0
citations

Latent Space Policies for Hierarchical Reinforcement Learning

ICML 2018
0
citations

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

ICML 2018
0
citations

Regret Minimization for Partially Observable Deep Reinforcement Learning

ICML 2018
0
citations

Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control

ICML 2018
0
citations

The Mirage of Action-Dependent Baselines in Reinforcement Learning

ICML 2018
0
citations

Online Meta-Learning

ICML 2019
0
citations

Diagnosing Bottlenecks in Deep Q-learning Algorithms

ICML 2019
0
citations

EMI: Exploration with Mutual Information

ICML 2019
0
citations

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

ICML 2019
0
citations

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning

ICML 2019
0
citations

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

ICML 2019
0
citations

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

NeurIPS 2018
0
citations

Meta-Reinforcement Learning of Structured Exploration Strategies

NeurIPS 2018
0
citations

Visual Memory for Robust Path Following

NeurIPS 2018
0
citations

Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior

NeurIPS 2018
0
citations

Visual Reinforcement Learning with Imagined Goals

NeurIPS 2018
0
citations

Probabilistic Model-Agnostic Meta-Learning

NeurIPS 2018
0
citations

Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition

NeurIPS 2018
0
citations

Data-Efficient Hierarchical Reinforcement Learning

NeurIPS 2018
0
citations

Compositional Plan Vectors

NeurIPS 2019
0
citations

Meta-Learning with Implicit Gradients

NeurIPS 2019
0
citations

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

NeurIPS 2019
0
citations

When to Trust Your Model: Model-Based Policy Optimization

NeurIPS 2019
0
citations

Causal Confusion in Imitation Learning

NeurIPS 2019
0
citations

MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies

NeurIPS 2019
0
citations

Off-Policy Evaluation via Off-Policy Classification

NeurIPS 2019
0
citations

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

NeurIPS 2019
0
citations

Planning with Goal-Conditioned Policies

NeurIPS 2019
0
citations

Guided Meta-Policy Search

NeurIPS 2019
0
citations

Unsupervised Curricula for Visual Meta-Reinforcement Learning

NeurIPS 2019
0
citations

Wasserstein Dependency Measure for Representation Learning

NeurIPS 2019
0
citations

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

NeurIPS 2020
0
citations

Conservative Q-Learning for Offline Reinforcement Learning

NeurIPS 2020
0
citations

Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

NeurIPS 2020
0
citations

Continual Learning of Control Primitives : Skill Discovery via Reset-Games

NeurIPS 2020
0
citations

Model Inversion Networks for Model-Based Optimization

NeurIPS 2020
0
citations

Gradient Surgery for Multi-Task Learning

NeurIPS 2020
0
citations

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

NeurIPS 2020
0
citations

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

NeurIPS 2020
0
citations

MOPO: Model-based Offline Policy Optimization

NeurIPS 2020
0
citations

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

NeurIPS 2020
0
citations

Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors

NeurIPS 2020
0
citations

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

NeurIPS 2020
0
citations

Bayesian Adaptation for Covariate Shift

NeurIPS 2021
0
citations

Offline Reinforcement Learning as One Big Sequence Modeling Problem

NeurIPS 2021
0
citations

Information is Power: Intrinsic Control via Information Capture

NeurIPS 2021
0
citations

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning

NeurIPS 2021
0
citations

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

NeurIPS 2021
0
citations

Outcome-Driven Reinforcement Learning via Variational Inference

NeurIPS 2021
0
citations