Wenhai Wang
16
Papers
2,439
Total Citations
Papers (16)
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
2,210
citations
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
ECCV 2024
86
citations
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ECCV 2024arXiv
57
citations
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
52
citations
Docopilot: Improving Multimodal Models for Document-Level Understanding
CVPR 2025
14
citations
MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost
ICML 2025
6
citations
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
ICCV 2025
6
citations
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
CVPR 2025
5
citations
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis
NeurIPS 2025
2
citations
Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings
NeurIPS 2025
1
citations
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
0
citations
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
AAAI 2025
0
citations
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting
AAAI 2025
0
citations
AVSegFormer: Audio-Visual Segmentation with Transformer
AAAI 2024arXiv
0
citations
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024
0
citations
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025
0
citations