2025 "markov decision processes" Papers
14 papers found
Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes
Haotian Wu, Gongpu Chen, Deniz Gunduz
ICLR 2025posterarXiv:2502.03335
7
citations
A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications
Zhenyu Tao, Wei Xu, Xiaohu You
NeurIPS 2025posterarXiv:2509.18714
2
citations
Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs
Mehran Shakerinava, Siamak Ravanbakhsh, Adam Oberman
NeurIPS 2025spotlightarXiv:2505.12049
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Hao Liang, Zhiquan Luo
NeurIPS 2025posterarXiv:2210.14051
18
citations
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models
Shengzhuang Chen, Yikai Liao, Xiaoxiao Sun et al.
ICLR 2025posterarXiv:2503.04655
1
citations
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.
ICLR 2025posterarXiv:2406.11810
3
citations
Efficient Preference-Based Reinforcement Learning: Randomized Exploration meets Experimental Design
Andreas Schlaginhaufen, Reda Ouhamma, Maryam Kamgarpour
NeurIPS 2025posterarXiv:2506.09508
1
citations
Non-convex entropic mean-field optimization via Best Response flow
Razvan-Andrei Lascu, Mateusz Majka
NeurIPS 2025posterarXiv:2505.22760
1
citations
No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes
Jasmine Bayrooti, Sattar Vakili, Amanda Prorok et al.
NeurIPS 2025oralarXiv:2510.20725
On the Convergence of Single-Timescale Actor-Critic
Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi et al.
NeurIPS 2025posterarXiv:2410.08868
1
citations
REINFORCE Converges to Optimal Policies with Any Learning Rate
Samuel Robertson, Thang Chu, Bo Dai et al.
NeurIPS 2025poster
REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA
Rui Miao, Babak Shahbaba, Annie Qu
NeurIPS 2025posterarXiv:2505.09496
1
citations
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
Jongmin Lee, Meiqi Sun, Pieter Abbeel
ICLR 2025posterarXiv:2512.10042
Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning
Xinsong Feng, Zihan Yu, Yanhai Xiong et al.
ICLR 2025posterarXiv:2502.05537
2
citations