"markov decision processes" Papers

23 papers found

Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes

Haotian Wu, Gongpu Chen, Deniz Gunduz

ICLR 2025posterarXiv:2502.03335
7
citations

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Zhenyu Tao, Wei Xu, Xiaohu You

NeurIPS 2025posterarXiv:2509.18714
2
citations

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava, Siamak Ravanbakhsh, Adam Oberman

NeurIPS 2025spotlightarXiv:2505.12049

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NeurIPS 2025posterarXiv:2210.14051
18
citations

CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models

Shengzhuang Chen, Yikai Liao, Xiaoxiao Sun et al.

ICLR 2025posterarXiv:2503.04655
1
citations

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.

ICLR 2025posterarXiv:2406.11810
3
citations

Efficient Preference-Based Reinforcement Learning: Randomized Exploration meets Experimental Design

Andreas Schlaginhaufen, Reda Ouhamma, Maryam Kamgarpour

NeurIPS 2025posterarXiv:2506.09508
1
citations

Non-convex entropic mean-field optimization via Best Response flow

Razvan-Andrei Lascu, Mateusz Majka

NeurIPS 2025posterarXiv:2505.22760
1
citations

No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes

Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.

NeurIPS 2025oralarXiv:2510.20725

On the Convergence of Single-Timescale Actor-Critic

Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi et al.

NeurIPS 2025posterarXiv:2410.08868
1
citations

REINFORCE Converges to Optimal Policies with Any Learning Rate

Samuel Robertson, Thang Chu, Bo Dai et al.

NeurIPS 2025poster

REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA

Rui Miao, Babak Shahbaba, Annie Qu

NeurIPS 2025posterarXiv:2505.09496
1
citations

SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation

Jongmin Lee, Meiqi Sun, Pieter Abbeel

ICLR 2025posterarXiv:2512.10042

AI Alignment with Changing and Influenceable Reward Functions

Micah Carroll, Davis Foote, Anand Siththaranjan et al.

ICML 2024poster

Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

Danil Provodin, Maurits Kaptein, Mykola Pechenizkiy

ICML 2024poster

Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction

Riccardo De Santi, Federico Arangath Joseph, Noah Liniger et al.

ICML 2024posterarXiv:2407.13364

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective

Lei Zhao, Mengdi Wang, Yu Bai

ICML 2024poster

Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data

Kishan Panaganti, Adam Wierman, Eric Mazumdar

ICML 2024poster

On The Statistical Complexity of Offline Decision-Making

Thanh Nguyen-Tang, Raman Arora

ICML 2024posterarXiv:2501.06339

Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes

David Klaska, Antonin Kucera, Vojtěch Kůr et al.

AAAI 2024paperarXiv:2312.12325
1
citations

SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

Subhojyoti Mukherjee, Josiah Hanna, Robert Nowak

ICML 2024posterarXiv:2406.02165

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

Fengdi Che, Chenjun Xiao, Jincheng Mei et al.

ICML 2024oral

Test-Time Regret Minimization in Meta Reinforcement Learning

Mirco Mutti, Aviv Tamar

ICML 2024poster