2025 "markov decision processes" Papers

14 papers found

Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes

Haotian Wu, Gongpu Chen, Deniz Gunduz

ICLR 2025posterarXiv:2502.03335
7
citations

A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications

Zhenyu Tao, Wei Xu, Xiaohu You

NeurIPS 2025posterarXiv:2509.18714
2
citations

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava, Siamak Ravanbakhsh, Adam Oberman

NeurIPS 2025spotlightarXiv:2505.12049

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds

Hao Liang, Zhiquan Luo

NeurIPS 2025posterarXiv:2210.14051
18
citations

CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models

Shengzhuang Chen, Yikai Liao, Xiaoxiao Sun et al.

ICLR 2025posterarXiv:2503.04655
1
citations

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.

ICLR 2025posterarXiv:2406.11810
3
citations

Efficient Preference-Based Reinforcement Learning: Randomized Exploration meets Experimental Design

Andreas Schlaginhaufen, Reda Ouhamma, Maryam Kamgarpour

NeurIPS 2025posterarXiv:2506.09508
1
citations

Non-convex entropic mean-field optimization via Best Response flow

Razvan-Andrei Lascu, Mateusz Majka

NeurIPS 2025posterarXiv:2505.22760
1
citations

No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes

Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.

NeurIPS 2025oralarXiv:2510.20725

On the Convergence of Single-Timescale Actor-Critic

Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi et al.

NeurIPS 2025posterarXiv:2410.08868
1
citations

REINFORCE Converges to Optimal Policies with Any Learning Rate

Samuel Robertson, Thang Chu, Bo Dai et al.

NeurIPS 2025poster

REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA

Rui Miao, Babak Shahbaba, Annie Qu

NeurIPS 2025posterarXiv:2505.09496
1
citations

SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation

Jongmin Lee, Meiqi Sun, Pieter Abbeel

ICLR 2025posterarXiv:2512.10042

Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

Xinsong Feng, Zihan Yu, Yanhai Xiong et al.

ICLR 2025posterarXiv:2502.05537
2
citations