Size Wu

7

Papers

288

Total Citations

Papers (7)

OMG-Seg: Is One Model Good Enough For All Segmentation?

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

CLIM: Contrastive Language-Image Mosaic for Region Representation

F-LMM: Grounding Frozen Large Multimodal Models

Aligning Bag of Regions for Open-Vocabulary Object Detection

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images