Jinguo Zhu

9

Papers

34

Total Citations

Papers (9)

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling

Complementary Relation Contrastive Distillation

Layerwise Optimization by Gradient Decomposition for Continual Learning

Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models