"vision-language models" Papers

570 papers found • Page 10 of 12

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Xin Jiang, Hao Tang, Junyao Gao et al.

AAAI 2024paperarXiv:2309.08912
59
citations

Discovering and Mitigating Visual Biases through Keyword Explanation

Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.

CVPR 2024highlightarXiv:2301.11104
50
citations

Distilling Vision-Language Models on Millions of Videos

Yue Zhao, Long Zhao, Xingyi Zhou et al.

CVPR 2024arXiv:2401.06129
21
citations

Domain-Controlled Prompt Learning

Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.

AAAI 2024paperarXiv:2310.07730
32
citations

Domain Prompt Learning with Quaternion Networks

Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.

CVPR 2024highlightarXiv:2312.08878
23
citations

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Shuyu Cheng, Yibo Miao, Yinpeng Dong et al.

ICML 2024arXiv:2405.19098
11
citations

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

ZUYAN LIU, Benlin Liu, Jiahui Wang et al.

ECCV 2024arXiv:2407.18121
26
citations

Efficient Test-Time Adaptation of Vision-Language Models

Adilbek Karmanov, Dayan Guan, Shijian Lu et al.

CVPR 2024arXiv:2403.18293
116
citations

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Chentao Cao, Zhun Zhong, Zhanke Zhou et al.

ICML 2024arXiv:2406.00806
28
citations

Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

Mingrui Wu, Jiayi Ji, Oucheng Huang et al.

ICML 2024arXiv:2406.16449
27
citations

EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding

jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.

ECCV 2024arXiv:2308.03135
30
citations

Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models

Minchan Kim, Minyeong Kim, Junik Bae et al.

ECCV 2024arXiv:2403.16167
10
citations

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024arXiv:2407.08268
47
citations

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024

Extracting Training Data From Document-Based VQA Models

Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.

ICML 2024arXiv:2407.08707
6
citations

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Shengbang Tong, Zhuang Liu, Yuexiang Zhai et al.

CVPR 2024arXiv:2401.06209
593
citations

FFF: Fixing Flawed Foundations in Contrastive Pre-Training Results in Very Strong Vision-Language Models

Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos

CVPR 2024arXiv:2405.10286
8
citations

Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments

Djamahl Etchegaray, Zi Helen Huang, Tatsuya Harada et al.

ECCV 2024arXiv:2403.13556
14
citations

FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Hang Hua, Jing Shi, Kushal Kafle et al.

ECCV 2024arXiv:2404.14715
14
citations

Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations

Yongshuo Zong, Tingyang Yu, Ruchika Chavhan et al.

ICML 2024arXiv:2310.01651
27
citations

FunQA: Towards Surprising Video Comprehension

Binzhu Xie, Sicheng Zhang, Zitang Zhou et al.

ECCV 2024arXiv:2306.14899
36
citations

GalLop: Learning global and local prompts for vision-language models

Marc Lafon, Elias Ramzi, Clément Rambour et al.

ECCV 2024arXiv:2407.01400
39
citations

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Hao Li, Ying Chen, Yifei Chen et al.

CVPR 2024arXiv:2402.19326
35
citations

Generalizing to Unseen Domains via Text-guided Augmentation

Daiqing Qi, Handong Zhao, Aidong Zhang et al.

ECCV 2024

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Ling Li, Yu Ye, Bingchuan Jiang et al.

ICML 2024arXiv:2406.18572
30
citations

Gradient-based Visual Explanation for Transformer-based CLIP

Chenyang ZHAO, Kun Wang, Xingyu Zeng et al.

ICML 2024

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

Zhaorun Chen, Zhuokai Zhao, HONGYIN LUO et al.

ICML 2024arXiv:2403.00425
142
citations

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Tianrui Guan, Fuxiao Liu, Xiyang Wu et al.

CVPR 2024arXiv:2310.14566
392
citations

Harmonizing Generalization and Personalization in Federated Prompt Learning

Tianyu Cui, Hongxia Li, Jingya Wang et al.

ICML 2024arXiv:2405.09771
28
citations

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Luke Bailey, Euan Ong, Stuart Russell et al.

ICML 2024arXiv:2309.00236
142
citations

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Oindrila Saha, Grant Horn, Subhransu Maji

CVPR 2024arXiv:2401.02460
65
citations

Improving fine-grained understanding in image-text pre-training

Ioana Bica, Anastasija Ilic, Matthias Bauer et al.

ICML 2024arXiv:2401.09865
46
citations

Improving Zero-Shot Generalization for CLIP with Variational Adapter

Ziqian Lu, Fengli Shen, Mushui Liu et al.

ECCV 2024
7
citations

Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

Marco Mistretta, Alberto Baldrati, Marco Bertini et al.

ECCV 2024arXiv:2407.03056
20
citations

INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding

jiha jang, Hoigi Seo, Se Young Chun

ECCV 2024arXiv:2409.06210
11
citations

Iterated Learning Improves Compositionality in Large Vision-Language Models

Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi et al.

CVPR 2024arXiv:2404.02145
16
citations

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

Penghui Du, Yu Wang, Yifan Sun et al.

ECCV 2024arXiv:2407.11335
16
citations

Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition

Yicheng Liu, Jie Wen, Chengliang Liu et al.

ICML 2024

Language-Informed Visual Concept Learning

Sharon Lee, Yunzhi Zhang, Shangzhe Wu et al.

ICLR 2024arXiv:2312.03587
12
citations

Large Language Models are Good Prompt Learners for Low-Shot Image Classification

Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu et al.

CVPR 2024arXiv:2312.04076
23
citations

Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

Shishira R Maiya, Anubhav Anubhav, Matthew Gwilliam et al.

ECCV 2024arXiv:2408.02672
7
citations

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

Yubin Wang, Xinyang Jiang, De Cheng et al.

AAAI 2024paperarXiv:2312.06323
42
citations

Learning Object State Changes in Videos: An Open-World Perspective

Zihui Xue, Kumar Ashutosh, Kristen Grauman

CVPR 2024arXiv:2312.11782
34
citations

Let Go of Your Labels with Unsupervised Transfer

Artyom Gadetsky, Yulun Jiang, Maria Brbic

ICML 2024arXiv:2406.07236
14
citations

Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification

Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.

CVPR 2024arXiv:2310.08255
48
citations

LingoQA: Video Question Answering for Autonomous Driving

Ana-Maria Marcu, Long Chen, Jan Hünermann et al.

ECCV 2024
34
citations

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Kai Zhang, Yi Luan, Hexiang Hu et al.

ICML 2024arXiv:2403.19651
88
citations

MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description

Ziqiang Zheng, Yiwei Chen, Huimin Zeng et al.

ECCV 2024

Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

Shibo Jie, Yehui Tang, Ning Ding et al.

ICML 2024arXiv:2405.05615
20
citations

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Sicong Leng, Hang Zhang, Guanzheng Chen et al.

CVPR 2024highlightarXiv:2311.16922
487
citations