Banghua Zhu

8

Papers

384

Total Citations

Papers (8)

From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline

How to Evaluate Reward Models for RLHF

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Doubly-Robust Self-Training

Towards Optimal Caching and Model Selection for Large Model Inference