by Kongcheng Zhang Papers
2 papers found
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
Kongcheng Zhang, QI YAO, Shunyu Liu et al.
NeurIPS 2025poster
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
NeurIPS 2025posterarXiv:2505.20347
19
citations