2025 Papers
21,856 papers found • Page 424 of 438
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
Yukuan Min, Muli Yang, Jinhao Zhang et al.
Vision-Language Model IP Protection via Prompt-based Learning
Lianyu Wang, Meng Wang, Huazhu Fu et al.
Vision Language Models are In-Context Value Learners
Yecheng Jason Ma, Joey Hejna, Chuyuan Fu et al.
Vision-Language Models Can't See the Obvious
YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.
Vision-Language Models Create Cross-Modal Task Representations
Grace Luo, Trevor Darrell, Amir Bar
Vision-Language Models Do Not Understand Negation
Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.
Vision-Language Model Selection and Reuse for Downstream Adaptation
Hao-Zhe Tan, Zhi Zhou, Yu-Feng Li et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models
Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin, Maximilian Beck, Korbinian Pöppel et al.
VisionMath: Vision-Form Mathematical Problem-Solving
Zongyang Ma, Yuxin Chen, Ziqi Zhang et al.
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan, Weiyun Wang, Zhe Chen et al.
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Senqiao Yang, Junyi Li, Xin Lai et al.
Vision Transformers Beat WideResNets on Small Scale Datasets Adversarial Robustness
Juntao Wu, Ziyu Song, Xiaoyu Zhang et al.
Vision Transformers Don't Need Trained Registers
Nicholas Jiang, Amil Dravid, Alexei Efros et al.
Vision Transformers with Self-Distilled Registers
Zipeng Yan, Yinjie Chen, Chong Zhou et al.
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
Mouxiang Chen, Lefei Shen, Zhuo Li et al.
VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models
Taesung Kwon, Jong Ye
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Senqiao Yang, Yukang Chen, Zhuotao Tian et al.
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.
VISO: Accelerating In-orbit Object Detection with Language-Guided Mask Learning and Sparse Inference
Meiqi Wang, Han Qiu
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu, Qize Yang, Yuan-Ming Li et al.
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Jialiang Kang, Han Shu, Wenshuo Li et al.
ViSPLA: Visual Iterative Self-Prompting for Language-Guided 3D Affordance Learning
Hritam Basak, Zhaozheng Yin
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Shi Yu, Chaoyue Tang, Bokai Xu et al.
VisRec: A Semi-Supervised Approach to Visibility Data Reconstruction in Radio Astronomy
Ruoqi Wang, Haitao Wang, Qiong Luo et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging
Yufan He, Pengfei Guo, Yucheng Tang et al.
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
Haiping Wang, Yuan Liu, Ziwei Liu et al.
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network
Kang You, Ziling Wei, Jing Yan et al.
Visual Abstraction: A Plug-and-Play Approach for Text-Visual Retrieval
Guofeng Ding, Yiding Lu, Peng Hu et al.
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Xiao Liu, Tianjie Zhang, Yu Gu et al.
Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili, Rohun Agrawal, Yisong Yue et al.
Visual Agents as Fast and Slow Thinkers
Guangyan Sun, Mingyu Jin, Zhenting Wang et al.
Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models
Fenil Doshi, Thomas Fel, Talia Konkle et al.
Visual and Auditory Feedback of Vibration, and Particle Effects for Enhancing Pseudo-Haptic Button Interaction in VR
Myeongji Ko, Woojoo Kim
Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning
Rina Bao, Shilong Dong, Zhenfang Chen et al.
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
Huajie Jiang, Zhengxian Li, Xiaohan Yu et al.
Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
Mingi Jung, Saehyung Lee, Eunji Kim et al.
Visual Autoregressive Modeling for Image Super-Resolution
Yunpeng Qu, Kun Yuan, Jinhua Hao et al.
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng, Kyle Genova, Songyou Peng et al.
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.
Visual Consensus Prompting for Co-Salient Object Detection
Jie Wang, Nana Yu, Zihao Zhang et al.
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.
Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection
Chanhyeong Yang, Taehoon song, Jihwan Park et al.
Visual Generation Without Guidance
Huayu Chen, Kai Jiang, Kaiwen Zheng et al.
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Zahra Babaiee, Peyman M. Kiasari, Daniela Rus et al.
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.