Enxin Song
4
Papers
602
Total Citations
Papers (4)
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
CVPR 2024
457
citations
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
ICLR 2025arXiv
102
citations
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
ICLR 2025
43
citations
Bringing RNNs Back to Efficient Open-Ended Video Understanding
ICCV 2025
0
citations