Ji Zhang

16
Papers
852
Total Citations
1
Affiliations

Affiliations

Alibaba

Papers (16)

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

CVPR 2024
601
citations

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

CVPR 2024
116
citations

DePT: Decoupled Prompt Tuning

CVPR 2024
60
citations

SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

CVPR 2024
49
citations

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

CVPR 2025
7
citations

A Simple yet Effective Layout Token in Large Language Models for Document Understanding

CVPR 2025arXiv
7
citations

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training

AAAI 2024arXiv
6
citations

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

CVPR 2025
4
citations

MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments

ICCV 2025
2
citations

Graphical Contrastive Losses for Scene Graph Parsing

CVPR 2019
0
citations

Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss

CVPR 2021
0
citations

Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding

CVPR 2022arXiv
0
citations

DETA: Denoised Task Adaptation for Few-Shot Learning

ICCV 2023arXiv
0
citations

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

ICCV 2023arXiv
0
citations

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

CVPR 2025
0
citations

Relationship Proposal Networks

CVPR 2017
0
citations