Wenhai Wang

16

Papers

2,439

Total Citations

Papers (16)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Docopilot: Improving Multimodal Models for Document-Level Understanding

MuLan: Adapting Multilingual Diffusion Models for Hundreds of Languages with Negligible Cost

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

AVSegFormer: Audio-Visual Segmentation with Transformer

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models