Zhe Chen

29
Papers
3,137
Total Citations

Papers (29)

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

CVPR 2024
2,210
citations

MUlti-Store Tracker (MUSTer): A Cognitive Psychology Inspired Approach to Object Tracking

CVPR 2015
637
citations

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

ICLR 2024
118
citations

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

ECCV 2024
86
citations

SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection

AAAI 2024arXiv
24
citations

Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding

AAAI 2024arXiv
18
citations

Docopilot: Improving Multimodal Models for Document-Level Understanding

CVPR 2025
14
citations

Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding

AAAI 2025
8
citations

Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis

AAAI 2025
7
citations

Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception

AAAI 2024arXiv
7
citations

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

CVPR 2025
5
citations

SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians

ICCV 2025
3
citations

OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision

ICCV 2023
0
citations

DDP: Diffusion Model for Dense Visual Prediction

ICCV 2023arXiv
0
citations

Invertible Neural BRDF for Object Inverse Rendering

ECCV 2020
0
citations

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

CVPR 2025
0
citations

RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis

NeurIPS 2025
0
citations

ReactGPT: Understanding of Chemical Reactions via In-Context Tuning

AAAI 2025
0
citations

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

AAAI 2025
0
citations

Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP

AAAI 2025
0
citations

Concurrent Planning and Execution in Lifelong Multi-Agent Path Finding with Delay Probabilities

AAAI 2025
0
citations

AVSegFormer: Audio-Visual Segmentation with Transformer

AAAI 2024arXiv
0
citations

Contrastive Boundary Learning for Point Cloud Segmentation

CVPR 2022arXiv
0
citations

Recurrent Glimpse-Based Decoder for Detection With Transformer

CVPR 2022arXiv
0
citations

CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose

CVPR 2023arXiv
0
citations

Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation

CVPR 2023arXiv
0
citations

InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions

CVPR 2023arXiv
0
citations

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

NeurIPS 2023
0
citations

All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation

NeurIPS 2023
0
citations