Poster "deterministic mdps" Papers
2 papers found
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
NeurIPS 2025posterarXiv:2502.20548
10
citations
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Fan Chen, Zeyu Jia, Alexander Rakhlin et al.
NeurIPS 2025posterarXiv:2505.20268
3
citations