Mohammad Ghavamzadeh

26
Papers
264
Total Citations

Papers (26)

Safe Policy Improvement by Minimizing Robust Baseline Regret

NeurIPS 2016arXiv
140
citations

Conservative Contextual Linear Bandits

NeurIPS 2017arXiv
105
citations

Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models

NeurIPS 2025
19
citations

Bayesian Regret Minimization in Offline Bandits

ICML 2024
0
citations

Policy Gradient for Coherent Risk Measures

NeurIPS 2015
0
citations

Adaptive Sampling for Minimax Fair Classification

NeurIPS 2021
0
citations

Private and Communication-Efficient Algorithms for Entropy Estimation

NeurIPS 2022
0
citations

Robust Reinforcement Learning using Offline Data

NeurIPS 2022
0
citations

Efficient Risk-Averse Reinforcement Learning

NeurIPS 2022
0
citations

Operator Splitting Value Iteration

NeurIPS 2022
0
citations

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

NeurIPS 2023
0
citations

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

NeurIPS 2023
0
citations

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

NeurIPS 2023
0
citations

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

NeurIPS 2023
0
citations

High Confidence Policy Improvement

ICML 2015
0
citations

Active Learning for Accurate Estimation of Linear Models

ICML 2017
0
citations

Bottleneck Conditional Density Estimation

ICML 2017
0
citations

Model-Independent Online Learning for Influence Maximization

ICML 2017
0
citations

Online Learning to Rank in Stochastic Click Models

ICML 2017
0
citations

Path Consistency Learning in Tsallis Entropy Regularized MDPs

ICML 2018
0
citations

More Robust Doubly Robust Off-policy Evaluation

ICML 2018
0
citations

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

ICML 2019
0
citations

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

NeurIPS 2018
0
citations

A Lyapunov-based Approach to Safe Reinforcement Learning

NeurIPS 2018
0
citations

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

NeurIPS 2019
0
citations

Online Planning with Lookahead Policies

NeurIPS 2020
0
citations