"math reasoning benchmarks" Papers
2 papers found
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
NeurIPS 2025posterarXiv:2502.20548
10
citations
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
Justin Chih-Yao Chen, Swarnadeep Saha, Elias Stengel-Eskin et al.
ICML 2024poster