Poster "vision-language models" Papers

475 papers found • Page 6 of 10

Filters:poster vision-language models Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025arXiv:2412.04383

citations

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

Ce Zhang, Zifu Wan, Zhehan Kan et al.

ICLR 2025arXiv:2502.06130

citations

Self-Evolving Visual Concept Library using Vision-Language Critics

Atharva Sehgal, Patrick Yuan, Ziniu Hu et al.

CVPR 2025arXiv:2504.00185

citations

Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models

Fushuo Huo, Wenchao Xu, Zhong Zhang et al.

ICLR 2025arXiv:2408.02032

citations

Selftok-Zero: Reinforcement Learning for Visual Generation via Discrete and Autoregressive Visual Tokens

Bohan Wang, Mingze Zhou, Zhongqi Yue et al.

NEURIPS 2025

Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation

Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.

CVPR 2025arXiv:2503.21780

citations

SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation

Hritam Basak, Zhaozheng Yin

CVPR 2025arXiv:2504.06389

citations

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Yi Ding, Ruqi Zhang

NEURIPS 2025arXiv:2505.22651

citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Hongbo Liu, Jingwen He, Yi Jin et al.

NEURIPS 2025arXiv:2506.21356

citations

Should VLMs be Pre-trained with Image Data?

Sedrick Keh, Jean Mercat, Samir Yitzhak Gadre et al.

ICLR 2025arXiv:2503.07603

SketchMind: A Multi-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches

Ehsan Latif, Zirak Khan, Xiaoming Zhai

NEURIPS 2025arXiv:2507.22904

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Shihan Wu, Ji Zhang, Pengpeng Zeng et al.

CVPR 2025arXiv:2412.11509

citations

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping

Weili Zeng, Ziyuan Huang, Kaixiang Ji et al.

ICCV 2025arXiv:2503.21817

citations

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

Xuesong Chen, Linjiang Huang, Tao Ma et al.

CVPR 2025arXiv:2505.16805

citations

SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning

XIN Hu, Ke Qin, Guiduo Duan et al.

ICCV 2025arXiv:2507.05798

citations

SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models

Kevin Miller, Aditya Gangrade, Samarth Mishra et al.

CVPR 2025arXiv:2502.16911

citations

Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation

Nairouz Mrabah, Nicolas Richet, Ismail Ayed et al.

ICCV 2025arXiv:2504.12436

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning

Wufei Ma, Yu-Cheng Chou, Qihao Liu et al.

NEURIPS 2025arXiv:2504.20024

citations

SPEX: Scaling Feature Interaction Explanations for LLMs

Justin S. Kang, Landon Butler, Abhineet Agarwal et al.

ICML 2025arXiv:2502.13870

citations

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Yang Liu, Ming Ma, Xiaomin Yu et al.

NEURIPS 2025arXiv:2505.12448

citations

Statistics Caching Test-Time Adaptation for Vision-Language Models

Zenghao Guan, Yucan Zhou, Wu Liu et al.

NEURIPS 2025

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Yong Liu, Song-Li Wu, Sule Bai et al.

ICCV 2025arXiv:2506.16058

citations

STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding

Aaryan Garg, Akash Kumar, Yogesh S. Rawat

CVPR 2025arXiv:2502.20678

citations

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Ming Li, Xin Gu, Fan Chen et al.

ICCV 2025arXiv:2505.02370

citations

Synthetic Data is an Elegant GIFT for Continual Vision-Language Models

Bin Wu, Wuxuan Shi, Jinqiao Wang et al.

CVPR 2025arXiv:2503.04229

citations

TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models

Pooyan Rahmanzadehgervi, Hung Nguyen, Rosanne Liu et al.

ICCV 2025arXiv:2412.18675

citations

TaiwanVQA: Benchmarking and Enhancing Cultural Understanding in Vision-Language Models

Hsin Yi Hsieh, Shang-Wei Liu, Chang-Chih Meng et al.

NEURIPS 2025

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Luca Barsellotti, Lorenzo Bianchi, Nicola Messina et al.

ICCV 2025arXiv:2411.19331

citations

Targeted Unlearning with Single Layer Unlearning Gradient

Zikui Cai, Yaoteng Tan, M. Salman Asif

ICML 2025arXiv:2407.11867

citations

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types

Jiankang Chen, Tianke Zhang, Changyi Liu et al.

ICLR 2025arXiv:2502.09925

citations

Teaching Human Behavior Improves Content Understanding Abilities Of VLMs

SOMESH SINGH, Harini S I, Yaman Singla et al.

ICLR 2025

citations

Teaching VLMs to Localize Specific Objects from In-context Examples

Sivan Doveh, Nimrod Shabtay, Eli Schwartz et al.

ICCV 2025arXiv:2411.13317

citations

Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation

Mehrdad Noori, David OSOWIECHI, Gustavo Vargas Hakim et al.

NEURIPS 2025arXiv:2505.21844

citations

Text to Sketch Generation with Multi-Styles

Tengjie Li, Shikui Tu, Lei Xu

NEURIPS 2025arXiv:2511.04123

The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

Lijun Sheng, Jian Liang, Ran He et al.

NEURIPS 2025arXiv:2506.24000

citations

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs

HONG LI, Nanxi Li, Yuanjie Chen et al.

ICLR 2025arXiv:2410.01417

citations

The Narrow Gate: Localized Image-Text Communication in Native Multimodal Models

Alessandro Serra, Francesco Ortu, Emanuele Panizon et al.

NEURIPS 2025arXiv:2412.06646

citations

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Cheng Yang, Yang Sui, Jinqi Xiao et al.

CVPR 2025arXiv:2503.18278

citations

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Young Kyun Jang, Ser-Nam Lim

ICCV 2025arXiv:2405.14715

citations

Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product

Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.

ICCV 2025arXiv:2508.00230

citations

Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark

Hao Guo, Xugong Qin, Jun Jie Ou Yang et al.

CVPR 2025arXiv:2512.20174

citations

Towards Understanding How Knowledge Evolves in Large Vision-Language Models

Sudong Wang, Yunjian Zhang, Yao Zhu et al.

CVPR 2025arXiv:2504.02862

citations

Training-Free Generation of Temporally Consistent Rewards from VLMs

Yinuo Zhao, Jiale Yuan, Zhiyuan Xu et al.

ICCV 2025arXiv:2507.04789

citations

Training-Free Test-Time Adaptation via Shape and Style Guidance for Vision-Language Models

Shenglong Zhou, Manjiang Yin, Leiyu Sun et al.

NEURIPS 2025

TRAP: Targeted Redirecting of Agentic Preferences

Hangoo Kang, Jehyeok Yeon, Gagandeep Singh

NEURIPS 2025arXiv:2505.23518

citations

Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation

jusheng zhang, Yijia Fan, Zimo Wen et al.

NEURIPS 2025

TULIP: Token-length Upgraded CLIP

Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki Asano et al.

ICLR 2025arXiv:2410.10034

citations

UIPro: Unleashing Superior Interaction Capability For GUI Agents

Hongxin Li, Jingran Su, Jingfan CHEN et al.

ICCV 2025arXiv:2509.17328

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

Yunheng Li, Yuxuan Li, Quan-Sheng Zeng et al.

ICCV 2025arXiv:2412.06244

citations

Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks

Nina Shvetsova, Arsha Nagrani, Bernt Schiele et al.

CVPR 2025arXiv:2503.18637

citations

← Previous

1...4 5 6 7 8...10