Poster "vision-language models" Papers
475 papers found • Page 8 of 10
Conference
Anchor-based Robust Finetuning of Vision-Language Models
Jinwei Han, Zhiwen Lin, Zhongyisun Sun et al.
An Empirical Study Into What Matters for Calibrating Vision-Language Models
Weijie Tu, Weijian Deng, Dylan Campbell et al.
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen, Haozhe Zhao, Tianyu Liu et al.
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Zhu, Keren Ye, Junjie Ke et al.
ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations
Kailas Vodrahalli, James Zou
A Touch, Vision, and Language Dataset for Multimodal Alignment
Letian Fu, Gaurav Datta, Huang Huang et al.
Attention Prompting on Image for Large Vision-Language Models
Runpeng Yu, Weihao Yu, Xinchao Wang
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Ye-Bin Moon, Nam Hyeon-Woo, Wonseok Choi et al.
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
Zhihe Lu, Jiawang Bai, Xin Li et al.
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Ian Huang, Guandao Yang, Leonidas Guibas
Bootstrapping Variational Information Pursuit with Large Language and Vision Models for Interpretable Image Classification
Aditya Chattopadhyay, Kwan Ho Ryan Chan, Rene Vidal
Bridging Environments and Language with Rendering Functions and Vision-Language Models
Théo Cachet, Christopher Dance, Olivier Sigaud
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
Jiahan Zhang, Qi Wei, Feng Liu et al.
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Yunheng Li, Zhong-Yu Li, Quan-Sheng Zeng et al.
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai, Yuhang Liu, Zhen Zhang et al.
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Siyu Jiao, hongguang Zhu, Yunchao Wei et al.
Conceptual Codebook Learning for Vision-Language Models
Yi Zhang, Ke Yu, Siqi Wu et al.
Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models
Zhengbo Wang, Jian Liang, Ran He et al.
Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Zaid Khan, Yun Fu
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.
DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.
Distilling Vision-Language Models on Millions of Videos
Yue Zhao, Long Zhao, Xingyi Zhou et al.
Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior
Shuyu Cheng, Yibo Miao, Yinpeng Dong et al.
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
ZUYAN LIU, Benlin Liu, Jiahui Wang et al.
Efficient Test-Time Adaptation of Vision-Language Models
Adilbek Karmanov, Dayan Guan, Shijian Lu et al.
Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection
Chentao Cao, Zhun Zhong, Zhanke Zhou et al.
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
Mingrui Wu, Jiayi Ji, Oucheng Huang et al.
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Minchan Kim, Minyeong Kim, Junik Bae et al.
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao, Zhuotao Tian, Hang Zhao et al.
Exploring Intrinsic Dimension for Vision-Language Model Pruning
Hanzhang Wang, Jiawen Zhang, Qingyuan Ma
Extracting Training Data From Document-Based VQA Models
Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong, Zhuang Liu, Yuexiang Zhai et al.
FFF: Fixing Flawed Foundations in Contrastive Pre-Training Results in Very Strong Vision-Language Models
Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
Djamahl Etchegaray, Zi Helen Huang, Tatsuya Harada et al.
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua, Jing Shi, Kushal Kafle et al.
Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations
Yongshuo Zong, Tingyang Yu, Ruchika Chavhan et al.
FunQA: Towards Surprising Video Comprehension
Binzhu Xie, Sicheng Zhang, Zitang Zhou et al.
GalLop: Learning global and local prompts for vision-language models
Marc Lafon, Elias Ramzi, Clément Rambour et al.
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li, Ying Chen, Yifei Chen et al.
Generalizing to Unseen Domains via Text-guided Augmentation
Daiqing Qi, Handong Zhao, Aidong Zhang et al.
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
Ling Li, Yu Ye, Bingchuan Jiang et al.
Gradient-based Visual Explanation for Transformer-based CLIP
Chenyang ZHAO, Kun Wang, Xingyu Zeng et al.
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen, Zhuokai Zhao, HONGYIN LUO et al.
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan, Fuxiao Liu, Xiyang Wu et al.
Harmonizing Generalization and Personalization in Federated Prompt Learning
Tianyu Cui, Hongxia Li, Jingya Wang et al.
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Luke Bailey, Euan Ong, Stuart Russell et al.
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
Oindrila Saha, Grant Horn, Subhransu Maji
Improving fine-grained understanding in image-text pre-training
Ioana Bica, Anastasija Ilic, Matthias Bauer et al.
Improving Zero-Shot Generalization for CLIP with Variational Adapter
Ziqian Lu, Fengli Shen, Mushui Liu et al.