2024 "vision-language models" Papers

89 papers found • Page 2 of 2

Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

Shibo Jie, Yehui Tang, Ning Ding et al.

ICML 2024poster

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Kaining Ying, Fanqing Meng, Jin Wang et al.

ICML 2024poster

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie, Polina Kirichenko, Mark Ibrahim et al.

ICML 2024poster

MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment

Anurag Das, Xinting Hu, Li Jiang et al.

ECCV 2024posterarXiv:2407.21654
11
citations

Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification

Yajing Zhai, Yawen Zeng, Zhiyong Huang et al.

AAAI 2024paperarXiv:2312.16797
33
citations

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024posterarXiv:2309.00616
82
citations

Open-Vocabulary Calibration for Fine-tuned CLIP

Shuoyuan Wang, Jindong Wang, Guoqing Wang et al.

ICML 2024poster

Open Vocabulary Multi-Label Video Classification

Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan et al.

ECCV 2024posterarXiv:2407.09073
5
citations

p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

Haoyuan Wu, Xinyun Zhang, Peng Xu et al.

AAAI 2024paperarXiv:2312.10613

Position: The Platonic Representation Hypothesis

Minyoung Huh, Brian Cheung, Tongzhou Wang et al.

ICML 2024poster

Prioritized Semantic Learning for Zero-shot Instance Navigation

Xinyu Sun, Lizhao Liu, Hongyan Zhi et al.

ECCV 2024posterarXiv:2403.11650
22
citations

Quantized Prompt for Efficient Generalization of Vision-Language Models

Tianxiang Hao, Xiaohan Ding, Juexiao Feng et al.

ECCV 2024posterarXiv:2407.10704
9
citations

Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization

Jian Liang, Sheng, Zhengbo Wang et al.

ICML 2024spotlight

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Ming Nie, Renyuan Peng, Chunwei Wang et al.

ECCV 2024posterarXiv:2312.03661
112
citations

Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion

Xuantong Liu, Tianyang Hu, Wenjia Wang et al.

ICML 2024poster

Region-centric Image-Language Pretraining for Open-Vocabulary Detection

Dahun Kim, Anelia Angelova, Weicheng Kuo

ECCV 2024posterarXiv:2310.00161
6
citations

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models

Agneet Chatterjee, Yiran Luo, Tejas Gokhale et al.

ECCV 2024posterarXiv:2408.02231
10
citations

Revisiting the Role of Language Priors in Vision-Language Models

Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.

ICML 2024poster

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

Christian Schlarmann, Naman Singh, Francesco Croce et al.

ICML 2024poster

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.

ICML 2024poster

SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

ziniu hu, Ahmet Iscen, Aashi Jain et al.

ICML 2024poster

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen et al.

ECCV 2024posterarXiv:2403.09296
16
citations

Semantic-Aware Data Augmentation for Text-to-Image Synthesis

Zhaorui Tan, Xi Yang, Kaizhu Huang

AAAI 2024paperarXiv:2312.07951
4
citations

Simple Image-Level Classification Improves Open-Vocabulary Object Detection

Ruohuan Fang, Guansong Pang, Xiao Bai

AAAI 2024paperarXiv:2312.10439
22
citations

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Guohao Sun, Can Qin, JIAMINAN WANG et al.

ECCV 2024posterarXiv:2403.11299
23
citations

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian liu et al.

ICML 2024poster

The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?

Qinyu Zhao, Ming Xu, Kartik Gupta et al.

ECCV 2024posterarXiv:2403.09037
15
citations

The Hard Positive Truth about Vision-Language Compositionality

Amita Kamath, Cheng-Yu Hsieh, Kai-Wei Chang et al.

ECCV 2024posterarXiv:2409.17958
15
citations

Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models

Jiaqi Xu, Mengyang Wu, Xiaowei Hu et al.

ECCV 2024posterarXiv:2409.02101
12
citations

Training-free Video Temporal Grounding using Large-scale Pre-trained Models

Minghang Zheng, Xinhao Cai, Qingchao Chen et al.

ECCV 2024posterarXiv:2408.16219
20
citations

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

Chen Ju, Haicheng Wang, Haozhe Cheng et al.

ECCV 2024posterarXiv:2407.11717
12
citations

Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models

Yifei Ming, Sharon Li

ICML 2024poster

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

Pengkun Jiao, Na Zhao, Jingjing Chen et al.

ECCV 2024posterarXiv:2407.05256
13
citations

Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models

Hao Cheng, Erjia Xiao, Jindong Gu et al.

ECCV 2024posterarXiv:2402.19150
15
citations

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

Peng Wu, Xuerong Zhou, Guansong Pang et al.

AAAI 2024paperarXiv:2308.11681
156
citations

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

Zhen Qu, Xian Tao, Mukesh Prasad et al.

ECCV 2024posterarXiv:2407.12276
55
citations

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Haobin Jiang, Zongqing Lu

ECCV 2024posterarXiv:2408.01942
3
citations

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Jinhao Li, Haopeng Li, Sarah Erfani et al.

ICML 2024poster

Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

Kun Ding, Haojian Zhang, Qiang Yu et al.

AAAI 2024paperarXiv:2404.00603
7
citations