"vision-language models" Papers
363 papers found • Page 6 of 8
VladVA: Discriminative Fine-tuning of LVLMs
Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
Ruifei Zhang, Wei Zhang, Xiao Tan et al.
VLMaterial: Procedural Material Generation with Large Vision-Language Models
Beichen Li, Rundi Wu, Armando Solar-Lezama et al.
VLMs can Aggregate Scattered Training Patches
Zhanhui Zhou, Lingjie Chen, Chao Yang et al.
Vocabulary-Guided Gait Recognition
Panjian Huang, Saihui Hou, Chunshui Cao et al.
VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning
Wenhao Li, Qiangchang Wang, Xianjing Meng et al.
Weakly-Supervised Learning of Dense Functional Correspondences
Stefan Stojanov, Linan Zhao, Yunzhi Zhang et al.
What Makes a Maze Look Like a Maze?
Joy Hsu, Jiayuan Mao, Joshua B Tenenbaum et al.
Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering
Yangfu Li, Hongjian Zhan, Tianyi Chen et al.
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng, Tri Cao, Zhirui Chen et al.
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
seil kang, Jinyeong Kim, Junhyeok Kim et al.
Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
Yanting Yang, Minghao Chen, Qibo Qiu et al.
Adaptive Multi-task Learning for Few-shot Object Detection
Yan Ren, Yanling Li, Wai-Kin Adams Kong
Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models
MENGYU ZHENG, Yehui Tang, Zhiwei Hao et al.
Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang, Xingjun Ma, Xin Wang et al.
Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models
Jie ZHANG, Xiaosong Ma, Song Guo et al.
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang et al.
An Empirical Study Into What Matters for Calibrating Vision-Language Models
Weijie Tu, Weijian Deng, Dylan Campbell et al.
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen, Haozhe Zhao, Tianyu Liu et al.
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
Zhaopeng Gu, Bingke Zhu, Guibo Zhu et al.
ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations
Kailas Vodrahalli, James Zou
A Touch, Vision, and Language Dataset for Multimodal Alignment
Letian Fu, Gaurav Datta, Huang Huang et al.
Attention Prompting on Image for Large Vision-Language Models
Runpeng Yu, Weihao Yu, Xinchao Wang
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models
Zhihe Lu, Jiawang Bai, Xin Li et al.
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Ian Huang, Guandao Yang, Leonidas Guibas
Bridging Environments and Language with Rendering Functions and Vision-Language Models
Théo Cachet, Christopher Dance, Olivier Sigaud
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
Jiahan Zhang, Qi Wei, Feng Liu et al.
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Yunheng Li, Zhong-Yu Li, Quan-Sheng Zeng et al.
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai, Yuhang Liu, Zhen Zhang et al.
CLIM: Contrastive Language-Image Mosaic for Region Representation
Size Wu, Wenwei Zhang, Lumin XU et al.
Code as Reward: Empowering Reinforcement Learning with VLMs
David Venuto, Mohammad Sami Nur Islam, Martin Klissarov et al.
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Siyu Jiao, hongguang Zhu, Yunchao Wei et al.
COMMA: Co-articulated Multi-Modal Learning
Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.
Compound Text-Guided Prompt Tuning via Image-Adaptive Cues
Hao Tan, Jun Li, Yizhuang Zhou et al.
Conceptual Codebook Learning for Vision-Language Models
Yi Zhang, Ke Yu, Siqi Wu et al.
Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models
Zhengbo Wang, Jian Liang, Ran He et al.
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.
DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.
Delving into Multimodal Prompting for Fine-Grained Visual Classification
Xin Jiang, Hao Tang, Junyao Gao et al.
Domain-Controlled Prompt Learning
Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.
Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior
Shuyu Cheng, Yibo Miao, Yinpeng Dong et al.
Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection
Chentao Cao, Zhun Zhong, Zhanke Zhou et al.
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
Mingrui Wu, Jiayi Ji, Oucheng Huang et al.
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Minchan Kim, Minyeong Kim, Junik Bae et al.
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao, Zhuotao Tian, Hang Zhao et al.
Exploring Intrinsic Dimension for Vision-Language Model Pruning
Hanzhang Wang, Jiawen Zhang, Qingyuan Ma
Extracting Training Data From Document-Based VQA Models
Francesco Pinto, Nathalie Rauschmayr, Florian Tramer et al.
Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
Djamahl Etchegaray, Zi Helen Huang, Tatsuya Harada et al.
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua, Jing Shi, Kushal Kafle et al.