Yuhao Dong

9

Papers

41

Total Citations

Papers (9)

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

EgoLife: Towards Egocentric Life Assistant