Zhongang Qi

20
Papers
1,540
Total Citations

Papers (20)

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

AAAI 2024arXiv
1,423
citations

Taming Rectified Flow for Inversion and Editing

ICML 2025
110
citations

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

NeurIPS 2025arXiv
4
citations

Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion

CVPR 2025arXiv
3
citations

DOGR: Towards Versatile Visual Document Grounding and Referring

ICCV 2025
0
citations

Mamba-3VL: Taming State Space Model for 3D Vision Language Learning

ICCV 2025
0
citations

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

AAAI 2025
0
citations

VisionMath: Vision-Form Mathematical Problem-Solving

ICCV 2025
0
citations

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

CVPR 2024
0
citations

How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

CVPR 2024
0
citations

Less is More: Empowering GUI Agent with Context-Aware Simplification

ICCV 2025
0
citations

PointConv: Deep Convolutional Networks on 3D Point Clouds

CVPR 2019
0
citations

Open-Book Video Captioning With Retrieve-Copy-Generate Network

CVPR 2021arXiv
0
citations

BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild

CVPR 2022
0
citations

LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation

CVPR 2023arXiv
0
citations

ViLEM: Visual-Language Error Modeling for Image-Text Retrieval

CVPR 2023
0
citations

Order-Prompted Tag Sequence Generation for Video Tagging

ICCV 2023
0
citations

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

ICCV 2023arXiv
0
citations

Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution

NeurIPS 2021
0
citations

Exploiting Contextual Objects and Relations for 3D Visual Grounding

NeurIPS 2023
0
citations