Weidi Xie

18
Papers
193
Total Citations

Papers (18)

Grounded Question-Answering in Long Egocentric Videos

CVPR 2024arXiv
46
citations

AutoAD III: The Prequel – Back to the Pixels

CVPR 2024
33
citations

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

AAAI 2025arXiv
25
citations

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

CVPR 2025arXiv
22
citations

Track-On: Transformer-based Online Point Tracking with Memory

ICLR 2025arXiv
16
citations

Towards Universal Soccer Video Understanding

CVPR 2025
14
citations

Multi-Sentence Grounding for Long-term Instructional Video

ECCV 2024arXiv
12
citations

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

ICLR 2025arXiv
11
citations

Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

ECCV 2024arXiv
8
citations

Learning Streaming Video Representation via Multitask Training

ICCV 2025arXiv
3
citations

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

ICCV 2025arXiv
3
citations

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

CVPR 2025
0
citations

Retrieval-Augmented Egocentric Video Captioning

CVPR 2024
0
citations

Amodal Ground Truth and Completion in the Wild

CVPR 2024
0
citations

Object-centric Video Question Answering with Visual Grounding and Referring

ICCV 2025
0
citations

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

CVPR 2024
0
citations

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

CVPR 2024
0
citations

MRGen: Segmentation Data Engine For Underrepresented MRI Modalities

ICCV 2025
0
citations