Sangho Lee

6

Papers

136

Total Citations

Papers (6)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

One Diffusion to Generate Them All

Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation

ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams

MAMS: Model-Agnostic Module Selection Framework for Video Captioning

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action