Sergey Levine
122
Papers
3,515
Total Citations
Papers (122)
Unsupervised Learning for Physical Interaction through Video Prediction
NeurIPS 2016arXiv
1,089
citations
Value Iteration Networks
NeurIPS 2016arXiv
675
citations
Learning to Poke by Poking: Experiential Learning of Intuitive Physics
NeurIPS 2016arXiv
595
citations
Backprop KF: Learning Discriminative Deterministic State Estimators
NeurIPS 2016arXiv
213
citations
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
NeurIPS 2017arXiv
172
citations
EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
NeurIPS 2017arXiv
160
citations
Guided Policy Search via Approximate Mirror Descent
NeurIPS 2016
127
citations
OGBench: Benchmarking Offline Goal-Conditioned RL
ICLR 2025arXiv
74
citations
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
ICLR 2024
68
citations
Scaling Test-Time Compute Without Verification or RL is Suboptimal
ICML 2025
68
citations
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
ICML 2025
63
citations
Flow Q-Learning
ICML 2025
52
citations
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
NeurIPS 2025arXiv
46
citations
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
ICLR 2025
40
citations
RLIF: Interactive Imitation Learning as Reinforcement Learning
ICLR 2024
25
citations
Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design
ICML 2025
12
citations
Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data
ICLR 2024
11
citations
Prioritized Generative Replay
ICLR 2025
9
citations
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
ICLR 2025
8
citations
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
ICML 2025
7
citations
Behavioral Exploration: Learning to Explore via In-Context Adaptation
ICML 2025
1
citations
Recurrent Network Models for Human Dynamics
ICCV 2015
0
citations
GPLAC: Generalizing Vision-Based Robotic Skills Using Weakly Labeled Images
ICCV 2017arXiv
0
citations
PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings
ICCV 2019
0
citations
Learning Predictive Models from Observation and Interaction
ECCV 2020
0
citations
Feedback Efficient Online Fine-Tuning of Diffusion Models
ICML 2024
0
citations
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
ICML 2024
0
citations
Foundation Policies with Hilbert Representations
ICML 2024
0
citations
Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
ICML 2024
0
citations
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
ICML 2024
0
citations
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
ICML 2024
0
citations
Prompting is a Double-Edged Sword: Improving Worst-Group Robustness of Foundation Models
ICML 2024
0
citations
Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making
ICML 2024
0
citations
Learning to Explore in POMDPs with Informational Rewards
ICML 2024
0
citations
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
ICML 2024
0
citations
Cognitive Mapping and Planning for Visual Navigation
CVPR 2017arXiv
0
citations
Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control
CVPR 2018
0
citations
Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks
CVPR 2019
0
citations
RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real
CVPR 2020
0
citations
Autonomous Reinforcement Learning via Subgoal Curricula
NeurIPS 2021
0
citations
Adaptive Risk Minimization: Learning to Adapt to Domain Shift
NeurIPS 2021
0
citations
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
NeurIPS 2021
0
citations
Which Mutual-Information Representation Learning Objectives are Sufficient for Control?
NeurIPS 2021
0
citations
Pragmatic Image Compression for Human-in-the-Loop Decision-Making
NeurIPS 2021
0
citations
Robust Predictable Control
NeurIPS 2021
0
citations
COMBO: Conservative Offline Model-Based Policy Optimization
NeurIPS 2021
0
citations
Imitating Past Successes can be Very Suboptimal
NeurIPS 2022
0
citations
Data-Driven Offline Decision-Making via Invariant Representation Learning
NeurIPS 2022
0
citations
You Only Live Once: Single-Life Reinforcement Learning
NeurIPS 2022
0
citations
Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity
NeurIPS 2022
0
citations
Adversarial Unlearning: Reducing Confidence Along Adversarial Directions
NeurIPS 2022
0
citations
Mismatched No More: Joint Model-Policy Optimization for Model-Based RL
NeurIPS 2022
0
citations
Distributionally Adaptive Meta Reinforcement Learning
NeurIPS 2022
0
citations
First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization
NeurIPS 2022
0
citations
Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation
NeurIPS 2022
0
citations
Contrastive Learning as Goal-Conditioned Reinforcement Learning
NeurIPS 2022
0
citations
MEMO: Test Time Robustness via Adaptation and Augmentation
NeurIPS 2022
0
citations
DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning
NeurIPS 2022
0
citations
ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints
NeurIPS 2023
0
citations
HIQL: Offline Goal-Conditioned RL with Latent States as Actions
NeurIPS 2023
0
citations
Learning to Influence Human Behavior with Offline Reinforcement Learning
NeurIPS 2023
0
citations
Ignorance is Bliss: Robust Control via Information Gating
NeurIPS 2023
0
citations
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
NeurIPS 2023
0
citations
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
NeurIPS 2023
0
citations
Accelerating Exploration with Unlabeled Prior Data
NeurIPS 2023
0
citations
Trust Region Policy Optimization
ICML 2015
0
citations
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
ICML 2016
0
citations
Continuous Deep Q-Learning with Model-based Acceleration
ICML 2016
0
citations
Modular Multitask Reinforcement Learning with Policy Sketches
ICML 2017
0
citations
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
ICML 2017
0
citations
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
ICML 2017
0
citations
Reinforcement Learning with Deep Energy-Based Policies
ICML 2017
0
citations
Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings
ICML 2018
0
citations
Latent Space Policies for Hierarchical Reinforcement Learning
ICML 2018
0
citations
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
ICML 2018
0
citations
Regret Minimization for Partially Observable Deep Reinforcement Learning
ICML 2018
0
citations
Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control
ICML 2018
0
citations
The Mirage of Action-Dependent Baselines in Reinforcement Learning
ICML 2018
0
citations
Online Meta-Learning
ICML 2019
0
citations
Diagnosing Bottlenecks in Deep Q-learning Algorithms
ICML 2019
0
citations
EMI: Exploration with Mutual Information
ICML 2019
0
citations
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
ICML 2019
0
citations
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning
ICML 2019
0
citations
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
ICML 2019
0
citations
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
NeurIPS 2018
0
citations
Meta-Reinforcement Learning of Structured Exploration Strategies
NeurIPS 2018
0
citations
Visual Memory for Robust Path Following
NeurIPS 2018
0
citations
Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior
NeurIPS 2018
0
citations
Visual Reinforcement Learning with Imagined Goals
NeurIPS 2018
0
citations
Probabilistic Model-Agnostic Meta-Learning
NeurIPS 2018
0
citations
Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition
NeurIPS 2018
0
citations
Data-Efficient Hierarchical Reinforcement Learning
NeurIPS 2018
0
citations
Compositional Plan Vectors
NeurIPS 2019
0
citations
Meta-Learning with Implicit Gradients
NeurIPS 2019
0
citations
Search on the Replay Buffer: Bridging Planning and Reinforcement Learning
NeurIPS 2019
0
citations
When to Trust Your Model: Model-Based Policy Optimization
NeurIPS 2019
0
citations
Causal Confusion in Imitation Learning
NeurIPS 2019
0
citations
MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies
NeurIPS 2019
0
citations
Off-Policy Evaluation via Off-Policy Classification
NeurIPS 2019
0
citations
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
NeurIPS 2019
0
citations
Planning with Goal-Conditioned Policies
NeurIPS 2019
0
citations
Guided Meta-Policy Search
NeurIPS 2019
0
citations
Unsupervised Curricula for Visual Meta-Reinforcement Learning
NeurIPS 2019
0
citations
Wasserstein Dependency Measure for Representation Learning
NeurIPS 2019
0
citations
Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
NeurIPS 2020
0
citations
Conservative Q-Learning for Offline Reinforcement Learning
NeurIPS 2020
0
citations
Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
NeurIPS 2020
0
citations
Continual Learning of Control Primitives : Skill Discovery via Reset-Games
NeurIPS 2020
0
citations
Model Inversion Networks for Model-Based Optimization
NeurIPS 2020
0
citations
Gradient Surgery for Multi-Task Learning
NeurIPS 2020
0
citations
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
NeurIPS 2020
0
citations
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
NeurIPS 2020
0
citations
MOPO: Model-based Offline Policy Optimization
NeurIPS 2020
0
citations
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
NeurIPS 2020
0
citations
Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors
NeurIPS 2020
0
citations
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
NeurIPS 2020
0
citations
Bayesian Adaptation for Covariate Shift
NeurIPS 2021
0
citations
Offline Reinforcement Learning as One Big Sequence Modeling Problem
NeurIPS 2021
0
citations
Information is Power: Intrinsic Control via Information Capture
NeurIPS 2021
0
citations
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
NeurIPS 2021
0
citations
Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification
NeurIPS 2021
0
citations
Outcome-Driven Reinforcement Learning via Variational Inference
NeurIPS 2021
0
citations