Poster "vision-language models" Papers

475 papers found • Page 3 of 10

FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts

Weihao Bo, Yanpeng Sun, Yu Wang et al.

NEURIPS 2025arXiv:2511.00480

FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models

Mainak Singha, Subhankar Roy, Sarthak Mehrotra et al.

ICCV 2025arXiv:2504.20860
2
citations

Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models

Xudong Li, Zihao Huang, Yan Zhang et al.

ICCV 2025arXiv:2409.05381
4
citations

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Hang Hua, Qing Liu, Lingzhi Zhang et al.

CVPR 2025arXiv:2411.15411
18
citations

Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection

Jiawen Zhu, YEW-SOON ONG, Chunhua Shen et al.

ICCV 2025arXiv:2410.10289
14
citations

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving

Yue Li, Meng Tian, Zhenyu Lin et al.

ICCV 2025arXiv:2503.21505
14
citations

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.

NEURIPS 2025arXiv:2506.21656
5
citations

FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs

Mothilal Asokan, Kebin wu, Fatima Albreiki

CVPR 2025arXiv:2504.01916
15
citations

FlySearch: Exploring how vision-language models explore

Adam Pardyl, Dominik Matuszek, Mateusz Przebieracz et al.

NEURIPS 2025arXiv:2506.02896
3
citations

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.

NEURIPS 2025arXiv:2506.03093
15
citations

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

Mingyang Song, Xiaoye Qu, Jiawei Zhou et al.

CVPR 2025arXiv:2503.12821
3
citations

From Panels to Prose: Generating Literary Narratives from Comics

Ragav Sachdeva, Andrew Zisserman

ICCV 2025arXiv:2503.23344
3
citations

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs

Xinli Xu, Wenhang Ge, Dicong Qiu et al.

ICCV 2025arXiv:2412.11258
7
citations

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Zhaochong An, Guolei Sun, Yun Liu et al.

CVPR 2025arXiv:2503.16282
10
citations

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.

NEURIPS 2025arXiv:2504.13169
10
citations

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Horn et al.

ICCV 2025arXiv:2501.06031
2
citations

Generating CAD Code with Vision-Language Models for 3D Designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi et al.

ICLR 2025arXiv:2410.05340
27
citations

GenIR: Generative Visual Feedback for Mental Image Retrieval

Diji Yang, Minghao Liu, Chung-Hsiang Lo et al.

NEURIPS 2025arXiv:2506.06220

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Pengyue Jia, Seongheon Park, Song Gao et al.

NEURIPS 2025arXiv:2505.13731
4
citations

GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks

Haoqiang Kang, Enna Sachdeva, Piyush Gupta et al.

CVPR 2025arXiv:2503.06514
8
citations

Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression

Juan Chen, Honglin liu, Yingying Ao et al.

NEURIPS 2025

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation

Wei Deng, Mengshi Qi, Huadong Ma

CVPR 2025arXiv:2503.18476
16
citations

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Seongheon Park, Sharon Li

NEURIPS 2025arXiv:2508.19972
2
citations

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Alexey Zakharov, Shimon Whiteson

NEURIPS 2025arXiv:2506.16396
1
citations

Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions

He Zhu, Quyu Kong, Kechun Xu et al.

CVPR 2025arXiv:2504.04744
7
citations

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang, Changle Zhou, Jiawei Kong et al.

NEURIPS 2025arXiv:2505.19678
9
citations

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

Rui Hu, Yuxuan Zhang, Lianghui Zhu et al.

ICCV 2025arXiv:2503.10596
5
citations

Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels

Yongshuo Zong, Qin ZHANG, DONGSHENG An et al.

CVPR 2025arXiv:2505.13788
3
citations

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Tong Wei, Yijun Yang, Junliang Xing et al.

ICCV 2025arXiv:2503.08525
8
citations

GTR-Loc: Geospatial Text Regularization Assisted Outdoor LiDAR Localization

Shangshu Yu, Wen Li, Xiaotian Sun et al.

NEURIPS 2025

Hallucinatory Image Tokens: A Training-free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs

Liwei Che, Qingze T Liu, Jing Jia et al.

ICCV 2025arXiv:2503.07772
3
citations

Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment

Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.

CVPR 2025arXiv:2409.19425
2
citations

Hierarchical Cross-modal Prompt Learning for Vision-Language Models

Hao Zheng, Shunzhi Yang, Zhuoxin He et al.

ICCV 2025arXiv:2507.14976
6
citations

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Runhui Huang, Xinpeng Ding, Chunwei Wang et al.

CVPR 2025arXiv:2407.08706
15
citations

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

Chenxin Tao, Shiqian Su, Xizhou Zhu et al.

CVPR 2025arXiv:2412.16158
5
citations

HQA-VLAttack: Towards High Quality Adversarial Attack on Vision-Language Pre-Trained Models

Han Liu, Jiaqi Li, Zhi Xu et al.

NEURIPS 2025

HumorDB: Can AI understand graphical humor?

Vedaant V Jain, Gabriel Kreiman, Felipe Feitosa

ICCV 2025arXiv:2406.13564
1
citations

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Yatai Ji, Shilong Zhang, Jie Wu et al.

ICLR 2025arXiv:2407.07577
8
citations

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

Ruofan Wang, Juncheng Li, Yixu Wang et al.

ICCV 2025arXiv:2411.00827
9
citations

Identifying and Mitigating Position Bias of Multi-image Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.

CVPR 2025arXiv:2503.13792
11
citations

iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning

Manyi Yao, Bingbing Zhuang, Sparsh Garg et al.

NEURIPS 2025arXiv:2509.19552
1
citations

ILIAS: Instance-Level Image retrieval At Scale

Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko et al.

CVPR 2025arXiv:2502.11748
15
citations

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

Chenwei Lin, Hanjia Lyu, Xian Xu et al.

ICCV 2025arXiv:2406.09105
4
citations

Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models

Hyundong Jin, Hyung Jin Chang, Eunwoo Kim

ICCV 2025arXiv:2508.00260

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Xin Dong, Shichao Dong, Jin Wang et al.

ICCV 2025arXiv:2507.05056
3
citations

Intervene-All-Paths: Unified Mitigation of LVLM Hallucinations across Alignment Formats

Jiaye Qian, Ge Zheng, Yuchen Zhu et al.

NEURIPS 2025arXiv:2511.17254
2
citations

IterIS: Iterative Inference-Solving Alignment for LoRA Merging

Hongxu chen, Zhen Wang, Runshi Li et al.

CVPR 2025arXiv:2411.15231
5
citations

Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP

Junsung Park, Jungbeom Lee, Jongyoon Song et al.

ICCV 2025arXiv:2501.10913
14
citations

Language-Assisted Feature Transformation for Anomaly Detection

EungGu Yun, Heonjin Ha, Yeongwoo Nam et al.

ICLR 2025arXiv:2503.01184
2
citations

Large (Vision) Language Models are Unsupervised In-Context Learners

Artyom Gadetsky, Andrei Atanov, Yulun Jiang et al.

ICLR 2025arXiv:2504.02349
3
citations