A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting

15
Citations
#514
in ECCV 2024
of 2387 papers
2
Authors
1
Data Points

Abstract

Panoptic and instance segmentation networks are often trained with specialized object detection modules, complex loss functions, and ad-hoc post-processing steps to handle the permutation-invariance of the instance masks. This work builds upon Stable Diffusion and proposes a latent diffusion approach for panoptic segmentation, resulting in a simple architecture which omits these complexities. Our training process consists of two steps: (1) training a shallow autoencoder to project the segmentation masks to latent space; (2) training a diffusion model to allow image-conditioned sampling in latent space. The use of a generative model unlocks the exploration of mask completion or inpainting, which has applications in interactive segmentation. The experimental validation on COCO and ADE20k yields strong results for segmentation tasks. Finally, we demonstrate the approach's adaptability to a multi-task setting by introducing learnable task embeddings.

Citation History

Jan 26, 2026
15