Xingyi Zhou
16
Papers
27
Total Citations
Papers (16)
Distilling Vision-Language Models on Millions of Videos
CVPR 2024
20
citations
Dense Video Object Captioning from Disjoint Supervision
ICLR 2025arXiv
7
citations
Pixel-Aligned Language Model
CVPR 2024
0
citations
Bottom-Up Object Detection by Grouping Extreme and Center Points
CVPR 2019
0
citations
Center-Based 3D Object Detection and Tracking
CVPR 2021arXiv
0
citations
Global Tracking Transformers
CVPR 2022arXiv
0
citations
Simple Multi-Dataset Detection
CVPR 2022
0
citations
How Can Objects Help Action Recognition?
CVPR 2023
0
citations
Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach
ICCV 2017arXiv
0
citations
Tracking Objects as Points
ECCV 2020
0
citations
Detecting Twenty-Thousand Classes Using Image-Level Supervision
ECCV 2022
0
citations
Visual Lexicon: Rich Image Features in Language Space
CVPR 2025
0
citations
Streaming Dense Video Captioning
CVPR 2024
0
citations
Multimodal Virtual Point 3D Detection
NeurIPS 2021
0
citations
Does Visual Pretraining Help End-to-End Reasoning?
NeurIPS 2023
0
citations
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
NeurIPS 2023
0
citations