Weidi Xie

43
Papers
374
Total Citations

Papers (43)

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

ECCV 2020
181
citations

Grounded Question-Answering in Long Egocentric Videos

CVPR 2024
46
citations

AutoAD III: The Prequel – Back to the Pixels

CVPR 2024
33
citations

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

AAAI 2025
25
citations

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

CVPR 2025
22
citations

Track-On: Transformer-based Online Point Tracking with Memory

ICLR 2025
16
citations

Towards Universal Soccer Video Understanding

CVPR 2025
14
citations

Multi-Sentence Grounding for Long-term Instructional Video

ECCV 2024
12
citations

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

ICLR 2025
11
citations

Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

ECCV 2024arXiv
8
citations

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

ICCV 2025
3
citations

Learning Streaming Video Representation via Multitask Training

ICCV 2025
3
citations

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision

CVPR 2023arXiv
0
citations

Collaboration Helps Camera Overtake LiDAR in 3D Detection

CVPR 2023arXiv
0
citations

OvarNet: Towards Open-Vocabulary Object Attribute Recognition

CVPR 2023arXiv
0
citations

AutoAD: Movie Description in Context

CVPR 2023arXiv
0
citations

Self-Supervised Video Object Segmentation by Motion Grouping

ICCV 2021arXiv
0
citations

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

ICCV 2023
0
citations

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description

ICCV 2023
0
citations

Joint-Relation Transformer for Multi-Person Motion Prediction

ICCV 2023arXiv
0
citations

The Making and Breaking of Camouflage

ICCV 2023arXiv
0
citations

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

CVPR 2025
0
citations

Open-vocabulary Object Segmentation with Diffusion Models

ICCV 2023arXiv
0
citations

Memory-augmented Dense Predictive Coding for Video Representation Learning

ECCV 2020
0
citations

PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images

ECCV 2022
0
citations

Prompting Visual-Language Models for Efficient Video Understanding

ECCV 2022
0
citations

Towards Open-Vocabulary Video Instance Segmentation

ICCV 2023arXiv
0
citations

Object-centric Video Question Answering with Visual Grounding and Referring

ICCV 2025
0
citations

MRGen: Segmentation Data Engine For Underrepresented MRI Modalities

ICCV 2025
0
citations

Retrieval-Augmented Egocentric Video Captioning

CVPR 2024
0
citations

Amodal Ground Truth and Completion in the Wild

CVPR 2024
0
citations

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

CVPR 2024
0
citations

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

CVPR 2024
0
citations

MAST: A Memory-Augmented Self-Supervised Tracker

CVPR 2020arXiv
0
citations

Localizing Visual Sounds the Hard Way

CVPR 2021arXiv
0
citations

Temporal Alignment Networks for Long-Term Video

CVPR 2022arXiv
0
citations

It's About Time: Analog Clock Reading in the Wild

CVPR 2022
0
citations

Label, Verify, Correct: A Simple Few Shot Object Detection Method

CVPR 2022arXiv
0
citations

Self-supervised Co-Training for Video Representation Learning

NeurIPS 2020
0
citations

Associating Objects and Their Effects in Video through Coordination Games

NeurIPS 2022
0
citations

Segmenting Moving Objects via an Object-Centric Layered Representation

NeurIPS 2022
0
citations

ReCo: Retrieve and Co-segment for Zero-shot Transfer

NeurIPS 2022
0
citations

Self-supervised Object-Centric Learning for Videos

NeurIPS 2023
0
citations