2024 "vision language models" Papers

12 papers found

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Wenbo Hu, Yifan Xu, Yi Li et al.

AAAI 2024paperarXiv:2308.09936
190
citations

Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs

Jeongkee Lim, Yusung Kim

ECCV 2024posterarXiv:2408.02261
5
citations

Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal, Jihan Yin, Erhan Bas

AAAI 2024paperarXiv:2308.06394
256
citations

Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View

Jin Wang, Shichao Dong, Yapeng Zhu et al.

ICML 2024posterarXiv:2405.17201

LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies

Jia Shi, Gautam Rajendrakumar Gare, Jinjin Tian et al.

ICML 2024poster

Leveraging VLM-Based Pipelines to Annotate 3D Objects

Rishabh Kabra, Loic Matthey, Alexander Lerchner et al.

ICML 2024posterarXiv:2311.17851

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Brian Gordon, Yonatan Bitton, Yonatan Shafir et al.

ECCV 2024posterarXiv:2312.03766
17
citations

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu et al.

ICML 2024posterarXiv:2402.07872

ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open Vocabulary Object Detection

Joonhyun Jeong, Geondo Park, Jayeon Yoo et al.

AAAI 2024paperarXiv:2312.07266
16
citations

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Yufei Wang, Zhanyi Sun, Jesse Zhang et al.

ICML 2024posterarXiv:2402.03681

Soft Prompt Generation for Domain Generalization

Shuanghao Bai, Yuedi Zhang, Wanqi Zhou et al.

ECCV 2024posterarXiv:2404.19286
30
citations

TrojVLM: Backdoor Attack Against Vision Language Models

Weimin Lyu, Lu Pang, Tengfei Ma et al.

ECCV 2024posterarXiv:2409.19232
23
citations