Wei Fu

3

Papers

8

Total Citations

Papers (3)

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Iteratively Learn Diverse Strategies with State Distance Information

NeurIPS 2023arXiv