Weidi Xie
43
Papers
374
Total Citations
Papers (43)
Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval
ECCV 2020
181
citations
Grounded Question-Answering in Long Egocentric Videos
CVPR 2024
46
citations
AutoAD III: The Prequel – Back to the Pixels
CVPR 2024
33
citations
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
AAAI 2025
25
citations
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
CVPR 2025
22
citations
Track-On: Transformer-based Online Point Tracking with Memory
ICLR 2025
16
citations
Towards Universal Soccer Video Understanding
CVPR 2025
14
citations
Multi-Sentence Grounding for Long-term Instructional Video
ECCV 2024
12
citations
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ICLR 2025
11
citations
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
ECCV 2024arXiv
8
citations
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
ICCV 2025
3
citations
Learning Streaming Video Representation via Multitask Training
ICCV 2025
3
citations
Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
CVPR 2023arXiv
0
citations
Collaboration Helps Camera Overtake LiDAR in 3D Detection
CVPR 2023arXiv
0
citations
OvarNet: Towards Open-Vocabulary Object Attribute Recognition
CVPR 2023arXiv
0
citations
AutoAD: Movie Description in Context
CVPR 2023arXiv
0
citations
Self-Supervised Video Object Segmentation by Motion Grouping
ICCV 2021arXiv
0
citations
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis
ICCV 2023
0
citations
AutoAD II: The Sequel - Who, When, and What in Movie Audio Description
ICCV 2023
0
citations
Joint-Relation Transformer for Multi-Person Motion Prediction
ICCV 2023arXiv
0
citations
The Making and Breaking of Camouflage
ICCV 2023arXiv
0
citations
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
CVPR 2025
0
citations
Open-vocabulary Object Segmentation with Diffusion Models
ICCV 2023arXiv
0
citations
Memory-augmented Dense Predictive Coding for Video Representation Learning
ECCV 2020
0
citations
PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images
ECCV 2022
0
citations
Prompting Visual-Language Models for Efficient Video Understanding
ECCV 2022
0
citations
Towards Open-Vocabulary Video Instance Segmentation
ICCV 2023arXiv
0
citations
Object-centric Video Question Answering with Visual Grounding and Referring
ICCV 2025
0
citations
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities
ICCV 2025
0
citations
Retrieval-Augmented Egocentric Video Captioning
CVPR 2024
0
citations
Amodal Ground Truth and Completion in the Wild
CVPR 2024
0
citations
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
CVPR 2024
0
citations
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
CVPR 2024
0
citations
MAST: A Memory-Augmented Self-Supervised Tracker
CVPR 2020arXiv
0
citations
Localizing Visual Sounds the Hard Way
CVPR 2021arXiv
0
citations
Temporal Alignment Networks for Long-Term Video
CVPR 2022arXiv
0
citations
It's About Time: Analog Clock Reading in the Wild
CVPR 2022
0
citations
Label, Verify, Correct: A Simple Few Shot Object Detection Method
CVPR 2022arXiv
0
citations
Self-supervised Co-Training for Video Representation Learning
NeurIPS 2020
0
citations
Associating Objects and Their Effects in Video through Coordination Games
NeurIPS 2022
0
citations
Segmenting Moving Objects via an Object-Centric Layered Representation
NeurIPS 2022
0
citations
ReCo: Retrieve and Co-segment for Zero-shot Transfer
NeurIPS 2022
0
citations
Self-supervised Object-Centric Learning for Videos
NeurIPS 2023
0
citations