Poster by Piotr Stanczyk Papers
2 papers found
BOND: Aligning LLMs with Best-of-N Distillation
Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot-Desenonges et al.
ICLR 2025posterarXiv:2407.14622
50
citations
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
Rishabh Agarwal, Nino Vieillard, Yongchao Zhou et al.
ICLR 2024poster