Poster "language model specialization" Papers
2 papers found
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin, Z.Z. Ren, Junxiao Song et al.
ICLR 2025posterarXiv:2408.08152
134
citations
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Ziniu Li, Congliang Chen, Tian Xu et al.
ICLR 2025posterarXiv:2408.16673
33
citations