Poster "vision-language models" Papers

475 papers found • Page 10 of 10

Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Xinyao Li, Yuke Li, Zhekai Du et al.

CVPR 2024arXiv:2403.06946
19
citations

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Guohao Sun, Can Qin, JIAMINAN WANG et al.

ECCV 2024arXiv:2403.11299
24
citations

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

Razvan Pasca, Alexey Gavryushin, Muhammad Hamza et al.

CVPR 2024arXiv:2301.09209
22
citations

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian liu et al.

ICML 2024arXiv:2401.02137
7
citations

TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing

Xudong Wang, Ke-Yue Zhang, Taiping Yao et al.

ECCV 2024
11
citations

The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?

Qinyu Zhao, Ming Xu, Kartik Gupta et al.

ECCV 2024arXiv:2403.09037
15
citations

The Hard Positive Truth about Vision-Language Compositionality

Amita Kamath, Cheng-Yu Hsieh, Kai-Wei Chang et al.

ECCV 2024arXiv:2409.17958
16
citations

Towards Neuro-Symbolic Video Understanding

Minkyu Choi, Harsh Goel, Mohammad Omama et al.

ECCV 2024arXiv:2403.11021
19
citations

Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models

Jiaqi Xu, Mengyang Wu, Xiaowei Hu et al.

ECCV 2024arXiv:2409.02101
12
citations

Training-free Video Temporal Grounding using Large-scale Pre-trained Models

Minghang Zheng, Xinhao Cai, Qingchao Chen et al.

ECCV 2024arXiv:2408.16219
21
citations

Transferable and Principled Efficiency for Open-Vocabulary Segmentation

Jingxuan Xu, Wuyang Chen, Yao Zhao et al.

CVPR 2024arXiv:2404.07448
1
citations

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

Chen Ju, Haicheng Wang, Haozhe Cheng et al.

ECCV 2024arXiv:2407.11717
13
citations

Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models

Yifei Ming, Sharon Li

ICML 2024arXiv:2405.01468
10
citations

Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

Mainak Singha, Ankit Jha, Shirsha Bose et al.

CVPR 2024arXiv:2404.00710
24
citations

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

Pengkun Jiao, Na Zhao, Jingjing Chen et al.

ECCV 2024arXiv:2407.05256
13
citations

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models

Hao Cheng, Erjia Xiao, Jindong Gu et al.

ECCV 2024arXiv:2402.19150
15
citations

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

Zhen Qu, Xian Tao, Mukesh Prasad et al.

ECCV 2024arXiv:2407.12276
58
citations

VicTR: Video-conditioned Text Representations for Activity Recognition

Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.

CVPR 2024arXiv:2304.02560
38
citations

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Ofir Abramovich, Niv Nayman, Sharon Fogel et al.

ECCV 2024arXiv:2407.12594
6
citations

Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection

Zihan Zhang, Zhuo Xu, Xiang Xiang

ECCV 2024
7
citations

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

Seokha Moon, Hyun Woo, Hongbeen Park et al.

ECCV 2024arXiv:2407.12345
22
citations

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Haobin Jiang, Zongqing Lu

ECCV 2024arXiv:2408.01942
4
citations

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Jinhao Li, Haopeng Li, Sarah Erfani et al.

ICML 2024arXiv:2406.02915
26
citations

WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding

Quan Kong, Yuki Kawana, Rajat Saini et al.

ECCV 2024arXiv:2407.15350
21
citations

X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization

Anna Kukleva, Fadime Sener, Edoardo Remelli et al.

CVPR 2024arXiv:2403.19811
5
citations