Bo Zhao

8

Papers

288

Total Citations

Papers (8)

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

MLVU: Benchmarking Multi-task Long Video Understanding

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly

BOOD: Boundary-based Out-Of-Distribution Data Generation

SEGA: A Stepwise Evolution Paradigm for Content-Aware Layout Generation with Design Prior

MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers

Towards Universal Dataset Distillation via Task-Driven Diffusion