"semantic alignment" Papers
20 papers found
Adaptive and Multi-scale Affinity Alignment for Hierarchical Contrastive Learning
Jiawei Huang, Minming Li, Hu Ding
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
Keon Lee, Dong Won Kim, Jaehyeon Kim et al.
GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification
Qiao Li, Jie Li, Yukang Zhang et al.
Layered Image Vectorization via Semantic Simplification
Zhenyu Wang, Jianxi Huang, Zhida Sun et al.
Learning a Cross-Modal Schrödinger Bridge for Visual Domain Generalization
Hao Zheng, Jingjun Yi, Qi Bi et al.
OmniZoom: A Universal Plug-and-Play Paradigm for Cross-Device Smooth Zoom Interpolation
Xiaoan Zhu, Yue Zhao, Tianyang Hu et al.
OOD-Barrier: Build a Middle-Barrier for Open-Set Single-Image Test Time Adaptation via Vision Language Models
Boyang Peng, Sanqing Qu, Tianpei Zou et al.
Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval
Jian Xiao, Zijie Song, Jialong Hu et al.
RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation
Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Muhammad Awais et al.
SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting
Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu, Hao Fei, Xiangtai Li et al.
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
Ruichen Wang, Zekang Chen, Chen Chen et al.
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
Xuelu Feng, Dongdong Chen, Junsong Yuan et al.
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng et al.
GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
Shiyue Zhang, Zheng Chong, Xujie Zhang et al.
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Ling Yang, Zhaochen Yu, Chenlin Meng et al.
Prioritized Semantic Learning for Zero-shot Instance Navigation
Xinyu Sun, Lizhao Liu, Hongyan Zhi et al.
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan, Renrui Zhang, Ziyu Guo et al.