Kevin Qinghong Lin
16
Papers
720
Total Citations
Papers (16)
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
ICLR 2025arXiv
455
citations
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
CVPR 2025arXiv
123
citations
VideoLLM-online: Online Video Large Language Model for Streaming Video
CVPR 2024arXiv
109
citations
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
CVPR 2025arXiv
14
citations
Learning Video Context as Interleaved Multimodal Sequences
ECCV 2024arXiv
12
citations
ROICtrl: Boosting Instance Control for Visual Generation
CVPR 2025arXiv
7
citations
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
CVPR 2025arXiv
0
citations
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting
AAAI 2025arXiv
0
citations
Bootstrapping SparseFormers from Vision Foundation Models
CVPR 2024arXiv
0
citations
All in One: Exploring Unified Video-Language Pre-Training
CVPR 2023arXiv
0
citations
Affordance Grounding From Demonstration Video To Target Image
CVPR 2023arXiv
0
citations
Too Large; Data Reduction for Vision-Language Pre-Training
ICCV 2023arXiv
0
citations
UniVTG: Towards Unified Video-Language Temporal Grounding
ICCV 2023arXiv
0
citations
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
ICCV 2023arXiv
0
citations
Egocentric Video-Language Pretraining
NeurIPS 2022arXiv
0
citations
Learning Visual Prior via Generative Pre-Training
NeurIPS 2023arXiv
0
citations