Size Wu

5

Papers

288

Total Citations

Papers (5)

OMG-Seg: Is One Model Good Enough For All Segmentation?

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

CLIM: Contrastive Language-Image Mosaic for Region Representation

F-LMM: Grounding Frozen Large Multimodal Models