2025 Papers
21,856 papers found • Page 423 of 438
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao, Xinggang Wang, Lianghui Zhu et al.
ViiNeuS: Volumetric Initialization for Implicit Neural Surface Reconstruction of Urban Scenes with Limited Image Overlap
Hala Djeghim, Nathan Piasco, Moussab Bennehar et al.
ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network
Zhuochen Yu, Bijie Qiu, Andy W. H. Khong
VIKING: Deep variational inference with stochastic projections
Samuel Matthiesen, Hrittik Roy, Nicholas Krämer et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Vishwesh Nath, Wenqi Li, Dong Yang et al.
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Yecheng Wu, Zhuoyang Zhang, Junyu Chen et al.
ViLLa: Video Reasoning Segmentation with Large Language Model
rongkun Zheng, Lu Qi, Xi Chen et al.
ViLU: Learning Vision-Language Uncertainties for Failure Prediction
Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez et al.
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
Haidong Xu, Guangwei Xu, Zhedong Zheng et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
Silin Gao, Sheryl Mathew, Li Mi et al.
Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning
wang lin, Wentao Hu, Liyu Jia et al.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha, Yapeng Tian
Vintix: Action Model via In-Context Reinforcement Learning
Andrei Polubarov, Nikita Lyubaykin, Alexander Derevyagin et al.
VIoTGPT: Learning to Schedule Vision Tools Towards Intelligent Video Internet of Things
Yaoyao Zhong, Mengshi Qi, Rui Wang et al.
VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion
Jaekyun Park, Hye Won Chung
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
Taewhan Kim, Soeun Lee, Si-Woo Kim et al.
VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim, Wooseok Seo, Junwan Kim et al.
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
Yi Feng, Yu Han, Xijing Zhang et al.
VIP: Vision Instructed Pre-training for Robotic Manipulation
Zhuoling Li, LiangLiang Ren, Jinrong Yang et al.
VIRES: Video Instance Repainting via Sketch and Text Guided Generation
Shuchen Weng, Haojie Zheng, Peixuan Zhang et al.
Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image
Junkun Chen, Aayush Bansal, Minh Vo et al.
Virtual Museum Tour Agent: Effects of Responsiveness and Awareness
Anant Upadhyay, Fu-Chia Yang, Christos Mousas
Virtual Nodes Can Help: Tackling Distribution Shifts in Federated Graph Learning
Xingbo Fu, Zihan Chen, Yinhan He et al.
Virtual Pass-through: Evaluating 3D Gaussian Splatting as an Alternative to Conventional Video Pass-through in Static Environments
Andy Schleising, Christian Kunert, Tobias Schwandt et al.
Virtual Roomie: Immersive Layout Co-design with a Virtual Agent
Angela L. Jimenez, Pedro Acevedo, Christos Mousas
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Zi Liang, Qingqing Ye, Xuan Liu et al.
ViSAGe: Video-to-Spatial Audio Generation
Jaeyeon Kim, Heeseung Yun, Gunhee Kim
Visceral Notices and Privacy Mechanisms for Eye Tracking in Augmented Reality
Nissi Otoo, Kailon Blue, G. Nikki Ramirez et al.
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu, Yuheng Ding, Bingxuan Li et al.
VisDiff: SDF-Guided Polygon Generation for Visibility Reconstruction, Characterization and Recognition
Rahul Moorthy Mahesh, Jun-Jee Chao, Volkan Isler
VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions
Haoang Lu, Yuanqi Su, Xiaoning Zhang et al.
Vision and Language Synergy for Rehearsal Free Continual Learning
Muhammad Anwar Masum, Mahardhika Pratama, Savitha Ramasamy et al.
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindström et al.
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Jiaming Han, Hao Chen, Yang Zhao et al.
Vision-aware Multimodal Prompt Tuning for Uploadable Multi-source Few-shot Domain Adaptation
Kuanghong Liu, Jin Wang, Kangjian He et al.
Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning
Hao Ma, Shijie Wang, Zhiqiang Pu et al.
Vision-centric Token Compression in Large Language Model
Ling Xing, Alex Jinpeng Wang, Rui Yan et al.
Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations
Yudi Xie, Weichen Huang, Esther Alter et al.
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
Zheng Anlin, Xin Wen, Xuanyang Zhang et al.
Vision Function Layer in Multimodal LLMs
Cheng Shi, Yizhou Yu, Sibei Yang
Vision Graph Prompting via Semantic Low-Rank Decomposition
Zixiang Ai, Zichen Liu, Jiahuan Zhou
Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes
Ting Yu, Yi Lin, Jun Yu et al.
Vision-guided Text Mining for Unsupervised Cross-modal Hashing with Community Similarity Quantization
Haozhi Fan, Yuan Cao
Vision-Language Embodiment for Monocular Depth Estimation
Jinchang Zhang, Guoyu Lu
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng, Xiangming Wang, Yongyong Chen et al.