Ge Zhang

17

Papers

615

Total Citations

Papers (17)

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Training Socially Aligned Language Models on Simulated Social Interactions

General-Reasoner: Advancing LLM Reasoning Across All Domains

OmniBench: Towards The Future of Universal Omni-Language Models

McEval: Massively Multilingual Code Evaluation

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Improving Depth Completion via Depth Feature Upsampling

LRRU: Long-short Range Recurrent Updating Networks for Depth Completion

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

MARBLE: Music Audio Representation Benchmark for Universal Evaluation