Shangzhe Di
5
Papers
96
Total Citations
Papers (5)
Grounded Question-Answering in Long Egocentric Videos
CVPR 2024arXiv
46
citations
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
AAAI 2025arXiv
25
citations
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
CVPR 2025arXiv
22
citations
Learning Streaming Video Representation via Multitask Training
ICCV 2025arXiv
3
citations
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
NeurIPS 2025arXiv
0
citations