XIAOJUAN QI
6
Papers
0
Total Citations
Papers (6)
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
ECCV 2024arXiv
0
citations
EA-VTR: Event-Aware Video-Text Retrieval
ECCV 2024arXiv
0
citations
Scaling RL to Long Videos
NeurIPS 2025arXiv
0
citations
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
NeurIPS 2025
0
citations
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
NeurIPS 2025arXiv
0
citations
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
ICLR 2025arXiv
0
citations