Alekh Agarwal

5

Papers

92

Total Citations

Papers (5)

Theoretical guarantees on the best-of-n alignment policy

Design Considerations in Offline Preference-based RL

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

The Non-linear $F$-Design and Applications to Interactive Learning

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning