Johan Ferret
4
Papers
50
Total Citations
Papers (4)
BOND: Aligning LLMs with Best-of-N Distillation
ICLR 2025
50
citations
WARM: On the Benefits of Weight Averaged Reward Models
ICML 2024
0
citations
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
ICML 2024
0
citations
There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning
NeurIPS 2021
0
citations