"semantic alignment" Papers

20 papers found

Adaptive and Multi-scale Affinity Alignment for Hierarchical Contrastive Learning

Jiawei Huang, Minming Li, Hu Ding

NeurIPS 2025poster

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

Keon Lee, Dong Won Kim, Jaehyeon Kim et al.

ICLR 2025posterarXiv:2406.11427
28
citations

GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification

Qiao Li, Jie Li, Yukang Zhang et al.

NeurIPS 2025posterarXiv:2510.22268
1
citations

Layered Image Vectorization via Semantic Simplification

Zhenyu Wang, Jianxi Huang, Zhida Sun et al.

CVPR 2025posterarXiv:2406.05404
9
citations

Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization

Hao Zheng, Jingjun Yi, Qi Bi et al.

NeurIPS 2025poster

OmniZoom: A Universal Plug-and-Play Paradigm for Cross-Device Smooth Zoom Interpolation

Xiaoan Zhu, Yue Zhao, Tianyang Hu et al.

NeurIPS 2025poster

OOD-Barrier: Build a Middle-Barrier for Open-Set Single-Image Test Time Adaptation via Vision Language Models

Boyang Peng, Sanqing Qu, Tianpei Zou et al.

NeurIPS 2025poster

Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval

Jian Xiao, Zijie Song, Jialong Hu et al.

NeurIPS 2025posterarXiv:2505.12499

RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation

Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais et al.

NeurIPS 2025posterarXiv:2509.15257

SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.

ICCV 2025posterarXiv:2502.06593
2
citations

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Shengqiong Wu, Hao Fei, Xiangtai Li et al.

ICLR 2025posterarXiv:2406.05127
58
citations

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.

CVPR 2025posterarXiv:2412.14167
68
citations

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Ruichen Wang, Zekang Chen, Chen Chen et al.

AAAI 2024paperarXiv:2305.13921
92
citations

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation

Xuelu Feng, Dongdong Chen, Junsong Yuan et al.

ECCV 2024posterarXiv:2403.12042
17
citations

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng et al.

ICML 2024poster

GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

Shiyue Zhang, Zheng Chong, Xujie Zhang et al.

ECCV 2024posterarXiv:2408.12352

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Ling Yang, Zhaochen Yu, Chenlin Meng et al.

ICML 2024poster

Prioritized Semantic Learning for Zero-shot Instance Navigation

Xinyu Sun, Lizhao Liu, Hongyan Zhi et al.

ECCV 2024posterarXiv:2403.11650
22
citations

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Shilin Yan, Renrui Zhang, Ziyu Guo et al.

AAAI 2024paperarXiv:2305.16318
58
citations

Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution

AAAI 2024paperarXiv:2312.07823