ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion

2citations
PDF
2
Citations
#1030
in ECCV 2024
of 2387 papers
8
Authors
1
Data Points

Abstract

Recent advancements in personalizing text-to-image (T2I) diffusion models have showcased their ability to generate images grounded in personalized visual concepts with just a few user-provided examples. However, these models often face challenges in preserving high visual fidelity, especially when adjusting scenes based on textual descriptions. To tackle this issue, we present ComFusion, an innovative strategy that utilizes pretrained models to create compositions of user-supplied subject images and predefined text scenes. ComFusion incorporates a class-scene prior preservation regularization, utilizing composites of subject class and scene-specific knowledge from pretrained models to boost generation fidelity. Moreover, ComFusion employs coarse-generated images to ensure they are in harmony with both the instance images and scene texts. Consequently, ComFusion maintains a delicate balance between capturing the subject's essence and ensuring scene fidelity. Extensive evaluations of ComFusion against various baselines in T2I personalization have demonstrated its qualitative and quantitative superiority.

Citation History

Jan 26, 2026
2