Poster "zero-shot classification" Papers
22 papers found
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi, Ali Nazari, Aminreza Sefid et al.
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci et al.
CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features
Po-han Li, Sandeep Chinchali, ufuk topcu
Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment
Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.
Mitigate the Gap: Improving Cross-Modal Alignment in CLIP
Sedigheh Eslami, Gerard de Melo
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang, Zihan Jia, Zhilin Dai et al.
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
David Robinson, Marius Miron, Masato Hagiwara et al.
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.
Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency
Kai Gan, Bo Ye, Min-Ling Zhang et al.
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
Shufan Shen, Junshu Sun, Qingming Huang et al.
Adversarial Robustification via Text-to-Image Diffusion Models
Daewon Choi, Jongheon Jeong, Huiwon Jang et al.
Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai, Yuhang Liu, Zhen Zhang et al.
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
Chenglin Yang, Siyuan Qiao, Yuan Cao et al.
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie, Polina Kirichenko, Mark Ibrahim et al.
Multi-Label Cluster Discrimination for Visual Representation Learning
Xiang An, Kaicheng Yang, Xiangzi Dai et al.
Online Zero-Shot Classification with CLIP
Qi Qian, JUHUA HU
OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport
Liangliang Shi, Jack Fan, Junchi Yan
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
Christian Schlarmann, Naman Singh, Francesco Croce et al.
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma, Furong Xu, Jian liu et al.
Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
che liu, Zhongwei Wan, Cheng Ouyang et al.