Ji Zhang

16

Papers

852

Total Citations

1

Affiliations

Affiliations

Alibaba

Papers (16)

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

DePT: Decoupled Prompt Tuning

SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

A Simple yet Effective Layout Token in Large Language Models for Document Understanding

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments

Graphical Contrastive Losses for Scene Graph Parsing

Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss

Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding

DETA: Denoised Task Adaptation for Few-Shot Learning

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Relationship Proposal Networks