2025 by Jiajun Xu Papers
2 papers found
RBench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo, Jiajun Xu, Yi Zhang et al.
ICML 2025poster
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Xeron Du, Yifan Yao, Kaijing Ma et al.
NeurIPS 2025posterarXiv:2502.14739
118
citations