Zhe Chen

18

Papers

2,500

Total Citations

Papers (18)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection

Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding

Docopilot: Improving Multimodal Models for Document-Level Understanding

Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding

Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception

Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians

Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP

RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis

ReactGPT: Understanding of Chemical Reactions via In-Context Tuning

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Concurrent Planning and Execution in Lifelong Multi-Agent Path Finding with Delay Probabilities

AVSegFormer: Audio-Visual Segmentation with Transformer