Lewei Lu

15

Papers

2,418

Total Citations

Papers (15)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

ControlLLM: Augment Language Models with Tools by Searching on Graphs

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Docopilot: Improving Multimodal Models for Document-Level Understanding

Weakly Supervised Monocular 3D Detection with a Single-View Image

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling

NeurIPS 2025arXiv

Modeling Continuous Motion for 3D Point Cloud Object Tracking

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction

Masked AutoDecoder is Effective Multi-Task Vision Generalist

Spatial Preference Rewarding for MLLMs Spatial Understanding

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models