Adaptive and Multi-scale Affinity Alignment for Hierarchical Contrastive Learning
Abstract
Contrastive self-supervised learning has emerged as a powerful paradigm for extracting meaningful representations without labels. While effective at capturing broad categorical distinctions, current methods often struggle to preserve the fine-grained and hierarchical relationships inherent in real-world data. From the perspective of semantic alignment, conventional contrastive learning aligns representations to semantic structure at a global level, treating the entire embedding space uniformly and frequently overlooking rich local structural information. In this paper, we propose Adaptive Multi-scale Affinity alignment (AMA-alignment), a framework that introduces localized contrastive objectives and a dynamic multi-scale optimization strategy to adaptively identify and refine poorly aligned regions within the embedding space. Although our model is inherently more complex due to its multi-scale and adaptive design, we provide the theoretical guarantees indicating that its convergence rate remains comparable to that of standard smooth non-convex optimization. We conduct a set of experiments on diverse benchmarks to show that AMA-alignment can effectively preserve hierarchical structure; moreover, AMA-alignment also outperforms existing contrastive methods on a range of downstream tasks.