Wen Sun
10
Papers
132
Total Citations
Papers (10)
Provable Offline Preference-Based Reinforcement Learning
ICLR 2024
39
citations
Making RL with Preference-based Feedback Efficient via Randomization
ICLR 2024
37
citations
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
ICLR 2025
14
citations
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
NeurIPS 2025arXiv
10
citations
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
ICLR 2025
10
citations
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
ICLR 2024
8
citations
Value-Guided Search for Efficient Chain-of-Thought Reasoning
NeurIPS 2025arXiv
7
citations
On Speeding Up Language Model Evaluation
ICLR 2025
4
citations
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
ICLR 2025arXiv
3
citations
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
ICML 2024
0
citations