Wenwei Zhang

11

Papers

522

Total Citations

Papers (11)

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities

OMG-Seg: Is One Model Good Enough For All Segmentation?

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

CLIM: Contrastive Language-Image Mosaic for Region Representation

F-LMM: Grounding Frozen Large Multimodal Models

Rethinking Verification for LLM Code Generation: From Generation to Testing

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives

Can AI Assistants Know What They Don't Know?