Wenwei Zhang

25

Papers

522

Total Citations

Papers (25)

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities

OMG-Seg: Is One Model Good Enough For All Segmentation?

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

CLIM: Contrastive Language-Image Mosaic for Region Representation

F-LMM: Grounding Frozen Large Multimodal Models

Rethinking Verification for LLM Code Generation: From Generation to Testing

Dense Distinct Query for End-to-End Object Detection

Robust Multi-Modality Multi-Object Tracking

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation

Side-Aware Boundary Localization for More Precise Object Detection

Dense Siamese Network for Dense Unsupervised Learning

Seesaw Loss for Long-Tailed Instance Segmentation

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Can AI Assistants Know What They Don't Know?

EcoNAS: Finding Proxies for Economical Neural Architecture Search

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

Aligning Bag of Regions for Open-Vocabulary Object Detection

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

K-Net: Towards Unified Image Segmentation

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

OV-PARTS: Towards Open-Vocabulary Part Segmentation