Sangho Lee
9
Papers
136
Total Citations
Papers (9)
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
96
citations
One Diffusion to Generate Them All
CVPR 2025
34
citations
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
ECCV 2024
6
citations
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
CVPR 2024
0
citations
A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos
CVPR 2018
0
citations
A Read-Write Memory Network for Movie Story Understanding
ICCV 2017arXiv
0
citations
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
ICCV 2021arXiv
0
citations
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
CVPR 2025
0
citations
MAMS: Model-Agnostic Module Selection Framework for Video Captioning
AAAI 2025
0
citations