A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting

15citations

arXiv:2401.10227 PDF

Citations

#514

in ECCV 2024

of 2387 papers

Authors

Data Points

Authors

Wouter Van Gansbeke Bert De Brabandere

Abstract

Panoptic and instance segmentation networks are often trained with specialized object detection modules, complex loss functions, and ad-hoc post-processing steps to handle the permutation-invariance of the instance masks. This work builds upon Stable Diffusion and proposes a latent diffusion approach for panoptic segmentation, resulting in a simple architecture which omits these complexities. Our training process consists of two steps: (1) training a shallow autoencoder to project the segmentation masks to latent space; (2) training a diffusion model to allow image-conditioned sampling in latent space. The use of a generative model unlocks the exploration of mask completion or inpainting, which has applications in interactive segmentation. The experimental validation on COCO and ADE20k yields strong results for segmentation tasks. Finally, we demonstrate the approach's adaptability to a multi-task setting by introducing learnable task embeddings.

Citation History

Jan 26, 2026