Kaiwen Wang
4
Papers
17
Total Citations
Papers (4)
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
NeurIPS 2025arXiv
10
citations
Value-Guided Search for Efficient Chain-of-Thought Reasoning
NeurIPS 2025
7
citations
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
ICML 2024
0
citations
Switching the Loss Reduces the Cost in Batch Reinforcement Learning
ICML 2024
0
citations