Xingyi Zhou

5

Papers

27

Total Citations

Papers (5)

Distilling Vision-Language Models on Millions of Videos

Dense Video Object Captioning from Disjoint Supervision

Visual Lexicon: Rich Image Features in Language Space

Streaming Dense Video Captioning

Pixel-Aligned Language Model