ECCV 2024 "vision-language models" Papers

25 papers found

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

Yanting Yang, Minghao Chen, Qibo Qiu et al.

ECCV 2024posterarXiv:2407.14872
4
citations

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024posterarXiv:2403.06764
343
citations

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai, Yuhang Liu, Zhen Zhang et al.

ECCV 2024posterarXiv:2311.16445
11
citations

Conceptual Codebook Learning for Vision-Language Models

Yi Zhang, Ke Yu, Siqi Wu et al.

ECCV 2024posterarXiv:2407.02350
6
citations

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.

ECCV 2024posterarXiv:2407.20337
31
citations

EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding

jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.

ECCV 2024posterarXiv:2308.03135
28
citations

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

Tong Shao, Zhuotao Tian, Hang Zhao et al.

ECCV 2024posterarXiv:2407.08268
44
citations

FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Hang Hua, Jing Shi, Kushal Kafle et al.

ECCV 2024posterarXiv:2404.14715
14
citations

GalLop: Learning global and local prompts for vision-language models

Marc Lafon, Elias Ramzi, Clément Rambour et al.

ECCV 2024posterarXiv:2407.01400
39
citations

Generalizing to Unseen Domains via Text-guided Augmentation

Daiqing Qi, Handong Zhao, Aidong Zhang et al.

ECCV 2024poster

Improving Zero-Shot Generalization for CLIP with Variational Adapter

Ziqian Lu, Fengli Shen, Mushui Liu et al.

ECCV 2024poster
7
citations

Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

Shishira R Maiya, Anubhav Anubhav, Matthew Gwilliam et al.

ECCV 2024posterarXiv:2408.02672
7
citations

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024posterarXiv:2309.00616
82
citations

Open Vocabulary Multi-Label Video Classification

Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan et al.

ECCV 2024posterarXiv:2407.09073
5
citations

Quantized Prompt for Efficient Generalization of Vision-Language Models

Tianxiang Hao, Xiaohan Ding, Juexiao Feng et al.

ECCV 2024posterarXiv:2407.10704
9
citations

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie, Renyuan Peng, Chunwei Wang et al.

ECCV 2024posterarXiv:2312.03661
112
citations

Region-centric Image-Language Pretraining for Open-Vocabulary Detection

Dahun Kim, Anelia Angelova, Weicheng Kuo

ECCV 2024posterarXiv:2310.00161
6
citations

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models

Agneet Chatterjee, Yiran Luo, Tejas Gokhale et al.

ECCV 2024posterarXiv:2408.02231
10
citations

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen et al.

ECCV 2024posterarXiv:2403.09296
16
citations

The Hard Positive Truth about Vision-Language Compositionality

Amita Kamath, Cheng-Yu Hsieh, Kai-Wei Chang et al.

ECCV 2024posterarXiv:2409.17958
15
citations

Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models

Jiaqi Xu, Mengyang Wu, Xiaowei Hu et al.

ECCV 2024posterarXiv:2409.02101
12
citations

Training-free Video Temporal Grounding using Large-scale Pre-trained Models

Minghang Zheng, Xinhao Cai, Qingchao Chen et al.

ECCV 2024posterarXiv:2408.16219
20
citations

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models

Hao Cheng, Erjia Xiao, Jindong Gu et al.

ECCV 2024posterarXiv:2402.19150
15
citations

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

Zhen Qu, Xian Tao, Mukesh Prasad et al.

ECCV 2024posterarXiv:2407.12276
55
citations

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Haobin Jiang, Zongqing Lu

ECCV 2024posterarXiv:2408.01942
3
citations