2025 Poster Papers
15,759 papers found • Page 306 of 316
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng, Xiangming Wang, Yongyong Chen et al.
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
Yukuan Min, Muli Yang, Jinhao Zhang et al.
Vision-Language Model IP Protection via Prompt-based Learning
Lianyu Wang, Meng Wang, Huazhu Fu et al.
Vision-Language Models Can't See the Obvious
YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.
Vision-Language Models Create Cross-Modal Task Representations
Grace Luo, Trevor Darrell, Amir Bar
Vision-Language Models Do Not Understand Negation
Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.
Vision-Language Model Selection and Reuse for Downstream Adaptation
Hao-Zhe Tan, Zhi Zhou, Yu-Feng Li et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models
Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin, Maximilian Beck, Korbinian Pöppel et al.
VisionMath: Vision-Form Mathematical Problem-Solving
Zongyang Ma, Yuxin Chen, Ziqi Zhang et al.
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan, Weiyun Wang, Zhe Chen et al.
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Senqiao Yang, Junyi Li, Xin Lai et al.
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
Mouxiang Chen, Lefei Shen, Zhuo Li et al.
VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models
Taesung Kwon, Jong Ye
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Senqiao Yang, Yukang Chen, Zhuotao Tian et al.
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.
VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference
Meiqi Wang, Han Qiu
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu, Qize Yang, Yuan-Ming Li et al.
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Jialiang Kang, Han Shu, Wenshuo Li et al.
ViSPLA: Visual Iterative Self-Prompting for Language-Guided 3D Affordance Learning
Hritam Basak, Zhaozheng Yin
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging
Yufan He, Pengfei Guo, Yucheng Tang et al.
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
Haiping Wang, Yuan Liu, Ziwei Liu et al.
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network
Kang You, Ziling Wei, Jing Yan et al.
Visual Abstraction: A Plug-and-Play Approach for Text-Visual Retrieval
Guofeng Ding, Yiding Lu, Peng Hu et al.
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Xiao Liu, Tianjie Zhang, Yu Gu et al.
Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili, Rohun Agrawal, Yisong Yue et al.
Visual Agents as Fast and Slow Thinkers
Guangyan Sun, Mingyu Jin, Zhenting Wang et al.
Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models
Fenil Doshi, Thomas Fel, Talia Konkle et al.
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
Huajie Jiang, Zhengxian Li, Xiaohan Yu et al.
Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
Mingi Jung, Saehyung Lee, Eunji Kim et al.
Visual Autoregressive Modeling for Image Super-Resolution
Yunpeng Qu, Kun Yuan, Jinhua Hao et al.
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.
Visual Consensus Prompting for Co-Salient Object Detection
Jie Wang, Nana Yu, Zihao Zhang et al.
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.
Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection
Chanhyeong Yang, Taehoon song, Jihwan Park et al.
Visual Generation Without Guidance
Huayu Chen, Kai Jiang, Kaiwen Zheng et al.
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration
Haina Qin, Wenyang Luo, Zewen Chen et al.
Visual Instruction Bottleneck Tuning
Changdae Oh, Jiatong Li, Shawn Im et al.
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun, Junbin Xiao, Tze Ho Elden Tse et al.
Visual Interestingness Decoded: How GPT-4o Mirrors Human Interests
Fitim Abdullahu, Helmut Grabner
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Anand Bhattad, Konpat Preechakul, Alexei Efros
VisualLens: Personalization through Task-Agnostic Visual History
Wang Bill Zhu, Deqing Fu, Kai Sun et al.
Visual Lexicon: Rich Image Features in Language Space
XuDong Wang, Xingyi Zhou, Alireza Fathi et al.
Visually Consistent Hierarchical Image Classification
Seulki Park, Youren Zhang, Stella Yu et al.