2024 Poster "language model self-improvement" Papers
2 papers found
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Natasha Butt, Blazej Manczak, Auke Wiggers et al.
ICML 2024posterarXiv:2402.04858
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Harrison Lee, Samrat Phatale, Hassan Mansoor et al.
ICML 2024posterarXiv:2309.00267