Wentao Liu

10

Papers

206

Total Citations

Papers (10)

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

CLIM: Contrastive Language-Image Mosaic for Region Representation

F-LMM: Grounding Frozen Large Multimodal Models

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

UniFS: Universal Few-shot Instance Perception with Point Representations

NADER: Neural Architecture Design via Multi-Agent Collaboration

ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries

Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling

Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering