Wenhai Wang

16
Papers
2,439
Total Citations

Papers (16)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

CVPR 2024
2,210
citations

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

ECCV 2024
86
citations

ControlLLM: Augment Language Models with Tools by Searching on Graphs

ECCV 2024arXiv
57
citations

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

ICCV 2025
52
citations

Docopilot: Improving Multimodal Models for Document-Level Understanding

CVPR 2025
14
citations

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost

ICML 2025
6
citations

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

ICCV 2025
6
citations

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

CVPR 2025
5
citations

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

NeurIPS 2025
2
citations

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings

NeurIPS 2025
1
citations

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

ICML 2024
0
citations

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

AAAI 2025
0
citations

Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

AAAI 2025
0
citations

AVSegFormer: Audio-Visual Segmentation with Transformer

AAAI 2024arXiv
0
citations

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

CVPR 2024
0
citations

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

CVPR 2025
0
citations