Difei Gao
6
Papers
262
Total Citations
Papers (6)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
CVPR 2025
123
citations
VideoLLM-online: Online Video Large Language Model for Streaming Video
CVPR 2024
109
citations
AssistGUI: Task-Oriented PC Graphical User Interface Automation
CVPR 2024
18
citations
Learning Video Context as Interleaved Multimodal Sequences
ECCV 2024arXiv
12
citations
Factorized Learning for Temporally Grounded Video-Language Models
ICCV 2025
0
citations
ViT-Lens: Towards Omni-modal Representations
CVPR 2024
0
citations