NeurIPS "value function estimation" Papers
3 papers found
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
NeurIPS 2025posterarXiv:2505.20686
12
citations
In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning
Chao Li, Bingkun BAO, Yang Gao
NeurIPS 2025poster
Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol
Pai Liu, Lingfeng Zhao, Shivangi Agarwal et al.
NeurIPS 2025posterarXiv:2502.08021
4
citations