Jilan Xu

10

Papers

1,016

Total Citations

Papers (10)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs

AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation

Learning Streaming Video Representation via Multitask Training

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision

Retrieval-Augmented Egocentric Video Captioning

CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping