MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

8citations

arXiv:2505.02648

Citations

#544

in CVPR 2025

of 2873 papers

Authors

Data Points

Authors

Mingcheng Li Xiaolu Hou Ziyang Liu Dingkang Yang Ziyun Qian Jiawei Chen Jinjie Wei Yue Jiang Qingyao Xu Lihua Zhang

Topics

text-to-image generation diffusion models multi-agent collaboration scene parsing complex scene generation region enhancement hierarchical compositional diffusion

Abstract

Diffusion models have shown excellent performance in text-to-image generation. Nevertheless, existing methods often suffer from performance bottlenecks when handling complex prompts that involve multiple objects, characteristics, and relations. Therefore, we propose a Multi-agent Collaboration-based Compositional Diffusion (MCCD) for text-to-image generation for complex scenes. Specifically, we design a multi-agent collaboration-based scene parsing module that generates an agent system comprising multiple agents with distinct tasks, utilizing MLLMs to extract various scene elements effectively. In addition, Hierarchical Compositional diffusion utilizes a Gaussian mask and filtering to refine bounding box regions and enhance objects through region enhancement, resulting in the accurate and high-fidelity generation of complex scenes. Comprehensive experiments demonstrate that our MCCD significantly improves the performance of the baseline models in a training-free manner, providing a substantial advantage in complex scene generation.

Citation History

Jan 26, 2026

Jan 27, 2026

Feb 1, 2026

8+8