2025 Poster Papers
15,759 papers found • Page 307 of 316
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim, Minji Bae, Kyuhong Shim et al.
Visual Modality Prompt for Adapting Vision-Language Object Detectors
Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni, YuTao Fan, Lei Zhang et al.
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Zhen Zeng, Leijiang Gu, Xun Yang et al.
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam, Soowon Son, Zhan Xu et al.
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
Yichao Liang, Nishanth Kumar, Hao Tang et al.
Visual Prompting for One-shot Controllable Video Editing without Inversion
Zhengbo Zhang, Yuxi Zhou, DUO PENG et al.
Visual Relation Diffusion for Human-Object Interaction Detection
Ping Cao, Yepeng Tang, Chunjie Zhang et al.
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang et al.
Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs
Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.
Visual Surface Wave Elastography: Revealing Subsurface Physical Properties via Visible Surface Waves
Alexander Ogren, Berthy Feng, Jihoon Ahn et al.
Visual Sync: Multi‑Camera Synchronization via Cross‑View Object Motion
Shaowei Liu, David Yao, Saurabh Gupta et al.
Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin et al.
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
ZIhui Cheng, Qiguang Chen, Xiao Xu et al.
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
Zuwei Long, Yunhang Shen, Chaoyou Fu et al.
ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
Kian Kenyon-Dean, Zitong Jerry Wang, John Urbanik et al.
VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow
Ada Görgün, Bernt Schiele, Jonas Fischer
VITED: Video Temporal Evidence Distillation
Yujie Lu, Yale Song, Lorenzo Torresani et al.
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
Hanwen Cao, Haobo Lu, Xiaosen Wang et al.
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei, Rama Chellappa
VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction-Editing Data and Long Captions
Ziteng Wang, Siqi Yang, Limeng Qiao et al.
ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
Yifan Li, Xin Li, Tianqin Li et al.
ViUniT: Visual Unit Tests for More Robust Visual Programming
Artemis Panagopoulou, Honglu Zhou, silvio savarese et al.
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting
Jiaxin Huang, Sheng Miao, Bangbang Yang et al.
VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks
Jinseong Jang, Chunfei Ma, Byeongwon Lee
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
shiduo zhang, Zhe Xu, Peiju Liu et al.
VladVA: Discriminative Fine-tuning of LVLMs
Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
Chongkai Gao, Zixuan Liu, Zhenghao Chi et al.
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation
Wei Zhao, Pengxiang Ding, Zhang Min et al.
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu et al.
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
Ruifei Zhang, Wei Zhang, Xiao Tan et al.
VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models
Xinan He, Yue Zhou, Bing Fan et al.
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang, Baolu Li, Yiming Zhang et al.
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang, Rui Meng, Xinyi Yang et al.
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou, Alexander Vilesov, Xuehai He et al.
VLMaterial: Procedural Material Generation with Large Vision-Language Models
Beichen Li, Rundi Wu, Armando Solar-Lezama et al.
VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking
Kichang Yang, Seonjun Kim, Minjae Kim et al.
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Maonan Wang, Yirong Chen, Aoyu Pang et al.
VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Chaoya Jiang, Yongrui Heng, Wei Ye et al.
VLMs can Aggregate Scattered Training Patches
Zhanhui Zhou, Lingjie Chen, Chao Yang et al.
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
Haoran Xu, Peixi Peng, Guang Tan et al.
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin, Mike Zheng Shou
VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving
Fanjie Kong, Yitong Li, Weihuang Chen et al.
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
Shufan Shen, Junshu Sun, Qingming Huang et al.
VL-SAM-V2: Open-World Object Detection with General and Specific Query Fusion
Zhiwei Lin, Yongtao Wang
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.
VMBench: A Benchmark for Perception-Aligned Video Motion Generation
Xinran Ling, Chen Zhu, Meiqi Wu et al.