Zijia Zhao

4

papers

14

total citations

papers (4)

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs

Efficient Motion-Aware Video MLLM

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

NeurIPS 2023arXiv