Separate and Diffuse: Using a Pretrained Diffusion Model for Better Source Separation

0citations

Project

Citations

#595

in ICLR 2024

of 2297 papers

Authors

Data Points

Authors

Shahar Lutati Eliya Nachmani Lior Wolf

Abstract

The problem of speech separation, also known as the cocktail party problem,refers to the task of isolating a single speech signal from a mixture of speechsignals. Previous work on source separation derived an upper bound for thesource separation task in the domain of human speech. This bound is derived fordeterministic models. Recent advancements in generative models challenge thisbound. We show how the upper bound can be generalized to the case of randomgenerative models. Applying a diffusion model Vocoder that was pretrained tomodel single-speaker voices on the output of a deterministic separation model leadsto state-of-the-art separation results. It is shown that this requires one to combinethe output of the separation model with that of the diffusion model. In our method,a linear combination is performed, in the frequency domain, using weights that areinferred by a learned model. We show state-of-the-art results on 2, 3, 5, 10, and 20speakers on multiple benchmarks. In particular, for two speakers, our method isable to surpass what was previously considered the upper performance bound.

Citation History

Jan 28, 2026