Kaiwen Wang
6
Papers
17
Total Citations
Papers (6)
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
NeurIPS 2025arXiv
10
citations
Value-Guided Search for Efficient Chain-of-Thought Reasoning
NeurIPS 2025
7
citations
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
ICML 2024
0
citations
Switching the Loss Reduces the Cost in Batch Reinforcement Learning
ICML 2024
0
citations
Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies
NeurIPS 2022
0
citations
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning
NeurIPS 2023
0
citations