"vision-language models" Papers

167 papers found • Page 3 of 4

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai, Yuhang Liu, Zhen Zhang et al.

ECCV 2024posterarXiv:2311.16445
11
citations

CLIM: Contrastive Language-Image Mosaic for Region Representation

Size Wu, Wenwei Zhang, Lumin XU et al.

AAAI 2024paperarXiv:2312.11376
24
citations

Code as Reward: Empowering Reinforcement Learning with VLMs

David Venuto, Mohammad Sami Nur Islam, Martin Klissarov et al.

ICML 2024spotlight

COMMA: Co-articulated Multi-Modal Learning

Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.

AAAI 2024paperarXiv:2401.00268

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues

Hao Tan, Jun Li, Yizhuang Zhou et al.

AAAI 2024paperarXiv:2312.06401
13
citations

Conceptual Codebook Learning for Vision-Language Models

Yi Zhang, Ke Yu, Siqi Wu et al.

ECCV 2024posterarXiv:2407.02350
6
citations

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

Zhengbo Wang, Jian Liang, Ran He et al.

ICML 2024poster

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.

ECCV 2024posterarXiv:2407.20337
31
citations

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.

ICML 2024poster

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Xin Jiang, Hao Tang, Junyao Gao et al.

AAAI 2024paperarXiv:2309.08912
55
citations

Domain-Controlled Prompt Learning

Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.

AAAI 2024paperarXiv:2310.07730
30
citations

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Shuyu Cheng, Yibo Miao, Yinpeng Dong et al.

ICML 2024poster

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Chentao Cao, Zhun Zhong, Zhanke Zhou et al.

ICML 2024poster

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

Mingrui Wu, Jiayi Ji, Oucheng Huang et al.

ICML 2024poster

EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding

jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.

ECCV 2024posterarXiv:2308.03135
28
citations

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024posterarXiv:2407.08268
44
citations

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024poster

Extracting Training Data From Document-Based VQA Models

Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.

ICML 2024poster

FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Hang Hua, Jing Shi, Kushal Kafle et al.

ECCV 2024posterarXiv:2404.14715
14
citations

Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations

Yongshuo Zong, Tingyang Yu, Ruchika Chavhan et al.

ICML 2024poster

GalLop: Learning global and local prompts for vision-language models

Marc Lafon, Elias Ramzi, Clément Rambour et al.

ECCV 2024posterarXiv:2407.01400
39
citations

Generalizing to Unseen Domains via Text-guided Augmentation

Daiqing Qi, Handong Zhao, Aidong Zhang et al.

ECCV 2024poster

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Ling Li, Yu Ye, Bingchuan Jiang et al.

ICML 2024poster

Gradient-based Visual Explanation for Transformer-based CLIP

Chenyang ZHAO, Kun Wang, Xingyu Zeng et al.

ICML 2024poster

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

Zhaorun Chen, Zhuokai Zhao, HONGYIN LUO et al.

ICML 2024poster

Harmonizing Generalization and Personalization in Federated Prompt Learning

Tianyu Cui, Hongxia Li, Jingya Wang et al.

ICML 2024poster

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Luke Bailey, Euan Ong, Stuart Russell et al.

ICML 2024poster

Improving fine-grained understanding in image-text pre-training

Ioana Bica, Anastasija Ilic, Matthias Bauer et al.

ICML 2024poster

Improving Zero-Shot Generalization for CLIP with Variational Adapter

Ziqian Lu, Fengli Shen, Mushui Liu et al.

ECCV 2024poster
7
citations

Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition

Yicheng Liu, Jie Wen, Chengliang Liu et al.

ICML 2024poster

Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

Shishira R Maiya, Anubhav Anubhav, Matthew Gwilliam et al.

ECCV 2024posterarXiv:2408.02672
7
citations

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

Yubin Wang, Xinyang Jiang, De Cheng et al.

AAAI 2024paperarXiv:2312.06323

Let Go of Your Labels with Unsupervised Transfer

Artyom Gadetsky, Yulun Jiang, Maria Brbic

ICML 2024poster

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang, Yi Luan, Hexiang Hu et al.

ICML 2024poster

Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

Shibo Jie, Yehui Tang, Ning Ding et al.

ICML 2024poster

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Kaining Ying, Fanqing Meng, Jin Wang et al.

ICML 2024poster

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie, Polina Kirichenko, Mark Ibrahim et al.

ICML 2024poster

Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification

Yajing Zhai, Yawen Zeng, Zhiyong Huang et al.

AAAI 2024paperarXiv:2312.16797
33
citations

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024posterarXiv:2309.00616
82
citations

Open-Vocabulary Calibration for Fine-tuned CLIP

Shuoyuan Wang, Jindong Wang, Guoqing Wang et al.

ICML 2024poster

Open Vocabulary Multi-Label Video Classification

Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan et al.

ECCV 2024posterarXiv:2407.09073
5
citations

p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

Haoyuan Wu, Xinyun Zhang, Peng Xu et al.

AAAI 2024paperarXiv:2312.10613

Position: The Platonic Representation Hypothesis

Minyoung Huh, Brian Cheung, Tongzhou Wang et al.

ICML 2024poster

Quantized Prompt for Efficient Generalization of Vision-Language Models

Tianxiang Hao, Xiaohan Ding, Juexiao Feng et al.

ECCV 2024posterarXiv:2407.10704
9
citations

Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization

Jian Liang, Sheng, Zhengbo Wang et al.

ICML 2024spotlight

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie, Renyuan Peng, Chunwei Wang et al.

ECCV 2024posterarXiv:2312.03661
112
citations

Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion

Xuantong Liu, Tianyang Hu, Wenjia Wang et al.

ICML 2024poster

Region-centric Image-Language Pretraining for Open-Vocabulary Detection

Dahun Kim, Anelia Angelova, Weicheng Kuo

ECCV 2024posterarXiv:2310.00161
6
citations

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models

Agneet Chatterjee, Yiran Luo, Tejas Gokhale et al.

ECCV 2024posterarXiv:2408.02231
10
citations

Revisiting the Role of Language Priors in Vision-Language Models

Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.

ICML 2024poster