Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization

0citations
0
Citations
#1334
in NeurIPS 2025
of 5858 papers
9
Authors
4
Data Points

Abstract

Domain generalization aims to train models that perform robustly on unseen target domains without access to target data. The realm of vision-language foundation model has opened a new venue owing to its inherent out-of-distribution generalization capability. However, the static alignment to class-level textual anchors remains insufficient to handle the dramatic distribution discrepancy from diverse domain-specific visual features. In this work, we propose a novel cross-domain Schrödinger Bridge (SB) method, namely SBGen, to handle this challenge, which explicitly formulates the stochastic semantic evolution, to gain better generalization to unseen domains. Technically, the proposed \texttt{SBGen} consists of three key components: (1) text-guided domain-aware feature selection to isolate semantically aligned image tokens; (2) stochastic cross-domain evolution to simulate the SB dynamics via a learnable time-conditioned drift; and (3) stochastic domain-agnostic interpolation to construct semantically grounded feature trajectories. Empirically, \texttt{SBGen} achieves state-of-the-art performance on domain generalization in both classification and segmentation. This work highlights the importance of modeling domain shifts as structured stochastic processes grounded in semantic alignment.

Citation History

Jan 25, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0
Jan 31, 2026
0