2025 Papers
21,856 papers found • Page 426 of 438
VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models
Xinan He, Yue Zhou, Bing Fan et al.
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang, Baolu Li, Yiming Zhang et al.
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang, Rui Meng, Xinyi Yang et al.
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou, Alexander Vilesov, Xuehai He et al.
VLMaterial: Procedural Material Generation with Large Vision-Language Models
Beichen Li, Rundi Wu, Armando Solar-Lezama et al.
VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking
Kichang Yang, Seonjun Kim, Minjae Kim et al.
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Maonan Wang, Yirong Chen, Aoyu Pang et al.
VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Chaoya Jiang, Yongrui Heng, Wei Ye et al.
VLMs can Aggregate Scattered Training Patches
Zhanhui Zhou, Lingjie Chen, Chao Yang et al.
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
Haoran Xu, Peixi Peng, Guang Tan et al.
VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
Shmuel Berman, Jia Deng
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan et al.
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin, Mike Zheng Shou
VLR-Driver: Large Vision-Language-Reasoning Models for Embodied Autonomous Driving
Fanjie Kong, Yitong Li, Weihuang Chen et al.
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang, Chao Qu, Zuming Huang et al.
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li, wei yuancheng, Zhihui Xie et al.
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
Shufan Shen, Junshu Sun, Qingming Huang et al.
VL-SAM-V2: Open-World Object Detection with General and Specific Query Fusion
Zhiwei Lin, Yongtao Wang
VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion
Meng Wang, Huilong Pi, Ruihui Li et al.
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.
VMBench: A Benchmark for Perception-Aligned Video Motion Generation
Xinran Ling, Chen Zhu, Meiqi Wu et al.
VMDT: Decoding the Trustworthiness of Video Foundation Models
Yujin Potter, Zhun Wang, Nicholas Crispino et al.
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
Runjia Li, Philip Torr, Andrea Vedaldi et al.
Vocabulary-Guided Gait Recognition
Panjian Huang, Saihui Hou, Chunshui Cao et al.
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Qian Ma, Ruoxiang Xu, Yongqiang Cai
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Yash Garg, Saketh Bachu, Arindam Dutta et al.
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye, Yukang Gan, Xiaoke Huang et al.
VODiff: Controlling Object Visibility Order in Text-to-Image Generation
Dong Liang, Jinyuan Jia, Yuhao Liu et al.
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin, Jeongsoo Choi, Puyuan Peng et al.
VOILA: Complexity-Aware Universal Segmentation of CT Images by Voxel Interacting with Language
Zishuo Wan, Yu Gao, Wanyuan Pang et al.
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Nilay Yilmaz, Maitreya Patel, Lawrence Luo et al.
VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond
Dabing Yu, Zheng Gao
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
Zelai Xu, Ruize Zhang, Chao Yu et al.
VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction
Martin de La Gorce, Charlie Hewitt, Tibor Takács et al.
Volume-Aware Distance for Robust Similarity Learning
Shuo Chen, Chen Gong, Jun Li et al.
Volume Optimality in Conformal Prediction with Structured Prediction Sets
Chao Gao, Liren Shan, Vaidehi Srinivas et al.
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution
ZELIN LI, Chenwei Wang, Zhaoke Huang et al.
Volume Transmission Implements Context Factorization to Target Online Credit Assignment and Enable Compositional Generalization
Matthew Bull, Po-Chen Kuo, Andrew Smith et al.
Volumetrically Consistent 3D Gaussian Rasterization
Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi et al.
VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions
Marko Mihajlovic, Siwei Zhang, Gen Li et al.
Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes
Stefano Esposito, Anpei Chen, Christian Reiser et al.
Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning
Mengmeng Chen, Xiaohu Wu, QIQI LIU et al.
VORTA: Efficient Video Diffusion via Routing Sparse Attention
Wenhao Sun, Rong-Cheng Tu, Yifu Ding et al.
VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow
Yancong Lin, Shiming Wang, Liangliang Nan et al.
Voter Priming Campaigns: Strategies, Equilibria, and Algorithms
Jonathan Shaki, Yonatan Aumann, Sarit Kraus
VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding
Minchao Jiang, Shunyu Jia, Jiaming Gu et al.
VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking
Zekun Qian, Ruize Han, Junhui Hou et al.
VoxDet: Rethinking 3D Semantic Scene Completion as Dense Object Detection
Wuyang Li, Zhu Yu, Alexandre Alahi