Xian Li

6

Papers

111

Total Citations

Papers (6)

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements

MEMORYLLM: Towards Self-Updatable Large Language Models

Self-Rewarding Language Models