by Robert Tang Papers
5 篇论文
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Li Hao, He CAO, Bin Feng et al.
NeurIPS 2025poster
17
citations
DyFlow: Dynamic Workflow Framework for Agentic Reasoning
Yanbo Wang, Zixiang Xu, Yue Huang et al.
NeurIPS 2025poster
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Jiajun Shi, Jian Yang, Jiaheng Liu et al.
NeurIPS 2025spotlight
4
citations
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks
Yilun Zhao, Kaiyan Zhang, Tiansheng Hu et al.
NeurIPS 2025spotlight
WebDancer: Towards Autonomous Information Seeking Agency
Jialong Wu, Baixuan Li, Runnan Fang et al.
NeurIPS 2025posterarXiv:2505.22648
81
citations