Dale Schuurmans

38

Papers

847

Total Citations

Papers (38)

Bridging the Gap Between Value and Policy Based Reinforcement Learning

NeurIPS 2017arXiv

Reward Augmented Maximum Likelihood for Neural Structured Prediction

NeurIPS 2016arXiv

Deep Learning Games

Multi-view Matrix Factorization for Linear Dynamical System Estimation

Plastic Learning with Deep Fourier Features

Improving Large Language Model Planning with Action Sequence Similarity

Position: Video as the New Language for Real-World Decision Making

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

Semi-Supervised Zero-Shot Classification With Label Representation Learning

Embedding Inference for Structured Multilabel Prediction

Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning

Escaping the Gravitational Pull of Softmax

Understanding the Effect of Stochasticity in Policy Optimization

Combiner: Full Attention Transformer with Sparse Computation Cost

On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

The Role of Baselines in Policy Gradient Optimization

Optimal Scaling for Locally Balanced Proposals in Discrete Spaces

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Chain of Thought Imitation with Procedure Cloning

A Simple Decentralized Cross-Entropy Method

Learning Universal Policies via Text-Guided Video Generation

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off

DISCS: A Benchmark for Discrete Sampling

Smoothed Action Value Functions for Learning Gaussian Policies

Learning to Generalize from Sparse and Underspecified Rewards

Understanding the Impact of Entropy on Policy Optimization

The Value Function Polytope in Reinforcement Learning

Non-delusional Q-learning and value-iteration

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Exponential Family Estimation via Adversarial Dynamics Embedding

Maximum Entropy Monte-Carlo Planning

Surrogate Objectives for Batch Policy Optimization in One-step Decision Making

Invertible Convolutional Flow

Off-Policy Evaluation via the Regularized Lagrangian

CoinDICE: Off-Policy Confidence Interval Estimation

Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration

A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs