Paper "vision-language models" Papers
38 papers found
Conference
A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter
Zirun Guo, Xize Cheng, Yangyang Wu et al.
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng, Qiguang Chen, Jin Zhang et al.
Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Xinmiao Yu, Xiaocheng Feng, Yun Li et al.
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang, Yijun Liu, Fei Yu et al.
Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
Yanming Xiu, Maria Gorlatova
Extract Free Dense Misalignment from CLIP
JeongYeon Nam, Jinbae Im, Wonjae Kim et al.
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong, Delong Ran, Jinyuan Liu et al.
Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
Youjun Zhao, Jiaying Lin, Rynson W. H. Lau
Interpreting the linear structure of vision-language model embedding spaces
Isabel Papadimitriou, Huangyuan Su, Thomas Fel et al.
Is Your Image a Good Storyteller?
Xiujie Song, Xiaoyi Pang, Haifeng Tang et al.
Language Prompt for Autonomous Driving
Dongming Wu, Wencheng Han, Yingfei Liu et al.
Learning to Prompt with Text Only Supervision for Vision-Language Models
Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
Yubo Cui, Zhiheng Li, Jiaqiang Wang et al.
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception
Yuan-Hong Liao, Sven Elflein, Liu He et al.
Low-Light Image Enhancement via Generative Perceptual Priors
Han Zhou, Wei Dong, Xiaohong Liu et al.
Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies Between Model Predictions and Human Responses in VQA
Jian Lan, Diego Frassinelli, Barbara Plank
MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context
Shuai Lyu, Rongchen Zhang, Zeqi Ma et al.
Position-Aware Guided Point Cloud Completion with CLIP Model
Feng Zhou, Qi Zhang, Ju Dai et al.
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
Quang-Hung Le, Long Hoang Dang, Ngan Hoang Le et al.
Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification
Jiaxiang Gou, Luping Ji, Pei Liu et al.
SKI Models: Skeleton Induced Vision-Language Embeddings for Understanding Activities of Daily Living
Arkaprava Sinha, Dominick Reilly, Francois Bremond et al.
Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection
Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua, Yunlong Tang, Chenliang Xu et al.
Vision-aware Multimodal Prompt Tuning for Uploadable Multi-source Few-shot Domain Adaptation
Kuanghong Liu, Jin Wang, Kangjian He et al.
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das et al.
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
CLIM: Contrastive Language-Image Mosaic for Region Representation
Size Wu, Wenwei Zhang, Lumin XU et al.
COMMA: Co-articulated Multi-Modal Learning
Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.
Compound Text-Guided Prompt Tuning via Image-Adaptive Cues
Hao Tan, Jun Li, Yizhuang Zhou et al.
Delving into Multimodal Prompting for Fine-Grained Visual Classification
Xin Jiang, Hao Tang, Junyao Gao et al.
Domain-Controlled Prompt Learning
Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models
Yubin Wang, Xinyang Jiang, De Cheng et al.
Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification
Yajing Zhai, Yawen Zeng, Zhiyong Huang et al.
p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models
Haoyuan Wu, Xinyun Zhang, Peng Xu et al.
Semantic-Aware Data Augmentation for Text-to-Image Synthesis
Zhaorui Tan, Xi Yang, Kaizhu Huang
Simple Image-Level Classification Improves Open-Vocabulary Object Detection
Ruohuan Fang, Guansong Pang, Xiao Bai
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang et al.
Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning
Kun Ding, Haojian Zhang, Qiang Yu et al.