"vision-language models" Papers
570 papers found • Page 12 of 12
Conference
Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
Jiaqi Xu, Mengyang Wu, Xiaowei Hu et al.
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng, Xinhao Cai, Qingchao Chen et al.
Transductive Zero-Shot and Few-Shot CLIP
Ségolène Martin, Yunshi HUANG, Fereshteh Shakeri et al.
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Jingxuan Xu, Wuyang Chen, Yao Zhao et al.
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju, Haicheng Wang, Haozhe Cheng et al.
Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
Yifei Ming, Sharon Li
Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Mainak Singha, Ankit Jha, Shirsha Bose et al.
Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
Pengkun Jiao, Na Zhao, Jingjing Chen et al.
Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
Hao Cheng, Erjia Xiao, Jindong Gu et al.
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang et al.
VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
Zhen Qu, Xian Tao, Mukesh Prasad et al.
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich, Niv Nayman, Sharon Fogel et al.
Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection
Zihan Zhang, Zhuo Xu, Xiang Xiang
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park et al.
Visual Grounding for Object-Level Generalization in Reinforcement Learning
Haobin Jiang, Zongqing Lu
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li, Haopeng Li, Sarah Erfani et al.
Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning
Kun Ding, Haojian Zhang, Qiang Yu et al.
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Quan Kong, Yuki Kawana, Rajat Saini et al.
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva, Fadime Sener, Edoardo Remelli et al.