ECCV 2024 "vision-language models" Papers
25 papers found
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
Yanting Yang, Minghao Chen, Qibo Qiu et al.
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen, Haozhe Zhao, Tianyu Liu et al.
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai, Yuhang Liu, Zhen Zhang et al.
Conceptual Codebook Learning for Vision-Language Models
Yi Zhang, Ke Yu, Siqi Wu et al.
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao, Zhuotao Tian, Hang Zhao et al.
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua, Jing Shi, Kushal Kafle et al.
GalLop: Learning global and local prompts for vision-language models
Marc Lafon, Elias Ramzi, Clément Rambour et al.
Generalizing to Unseen Domains via Text-guided Augmentation
Daiqing Qi, Handong Zhao, Aidong Zhang et al.
Improving Zero-Shot Generalization for CLIP with Variational Adapter
Ziqian Lu, Fengli Shen, Mushui Liu et al.
Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
Shishira R Maiya, Anubhav Anubhav, Matthew Gwilliam et al.
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang, Xiaoyang Wu, Xi Chen et al.
Open Vocabulary Multi-Label Video Classification
Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan et al.
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao, Xiaohan Ding, Juexiao Feng et al.
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Ming Nie, Renyuan Peng, Chunwei Wang et al.
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim, Anelia Angelova, Weicheng Kuo
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
Agneet Chatterjee, Yiran Luo, Tejas Gokhale et al.
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen et al.
The Hard Positive Truth about Vision-Language Compositionality
Amita Kamath, Cheng-Yu Hsieh, Kai-Wei Chang et al.
Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
Jiaqi Xu, Mengyang Wu, Xiaowei Hu et al.
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng, Xinhao Cai, Qingchao Chen et al.
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
Hao Cheng, Erjia Xiao, Jindong Gu et al.
VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
Zhen Qu, Xian Tao, Mukesh Prasad et al.
Visual Grounding for Object-Level Generalization in Reinforcement Learning
Haobin Jiang, Zongqing Lu