NeurIPS Papers
5,858 papers found • Page 114 of 118
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu, Zekun Li, Zheqi He et al.
Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations
Xin Liu, Haoran Li, Dongbin Zhao
VideoTitans: Scalable Video Prediction with Integrated Short- and Long-term Memory
Young-Jae Park, Minseok Seo, Hae-Gon Jeon
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Yichao Shen, Fangyun Wei, Zhiying Du et al.
Video World Models with Long-term Spatial Memory
Tong Wu, Shuai Yang, Ryan Po et al.
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Qi Li, Runpeng Yu, Xinchao Wang
ViewCraft3D: High-fidelity and View-Consistent 3D Vector Graphics Synthesis
Chuang Wang, Haitao Zhou, Ling Luo et al.
ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models
Zixun Fang, Kai Zhu, Zhiheng Liu et al.
VIKING: Deep variational inference with stochastic projections
Samuel Matthiesen, Hrittik Roy, Nicholas Krämer et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
Haidong Xu, Guangwei Xu, Zhedong Zheng et al.
Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning
wang lin, Wentao Hu, Liyu Jia et al.
VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion
Jaekyun Park, Hye Won Chung
Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image
Junkun Chen, Aayush Bansal, Minh Vo et al.
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Zi Liang, Qingqing Ye, Xuan Liu et al.
VisDiff: SDF-Guided Polygon Generation for Visibility Reconstruction, Characterization and Recognition
Rahul Moorthy Mahesh, Jun-Jee Chao, Volkan Isler
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindström et al.
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Jiaming Han, Hao Chen, Yang Zhao et al.
Vision-centric Token Compression in Large Language Model
Ling Xing, Alex Jinpeng Wang, Rui Yan et al.
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
Zheng Anlin, Xin Wen, Xuanyang Zhang et al.
Vision Function Layer in Multimodal LLMs
Cheng Shi, Yizhou Yu, Sibei Yang
Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models
Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Senqiao Yang, Junyi Li, Xin Lai et al.
Vision Transformers Don't Need Trained Registers
Nicholas Jiang, Amil Dravid, Alexei Efros et al.
Vision Transformers with Self-Distilled Registers
Zipeng Yan, Yinjie Chen, Chong Zhou et al.
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Jialiang Kang, Han Shu, Wenshuo Li et al.
ViSPLA: Visual Iterative Self-Prompting for Language-Guided 3D Affordance Learning
Hritam Basak, Zhaozheng Yin
Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models
Fenil Doshi, Thomas Fel, Talia Konkle et al.
Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection
Chanhyeong Yang, Taehoon song, Jihwan Park et al.
Visual Instruction Bottleneck Tuning
Changdae Oh, Jiatong Li, Shawn Im et al.
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Anand Bhattad, Konpat Preechakul, Alexei Efros
VisualLens: Personalization through Task-Agnostic Visual History
Wang Bill Zhu, Deqing Fu, Kai Sun et al.
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank
Tianhe Wu, Jian Zou, Jie Liang et al.
Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs
Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.
Visual Sync: Multi‑Camera Synchronization via Cross‑View Object Motion
Shaowei Liu, David Yao, Saurabh Gupta et al.
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
ZIhui Cheng, Qiguang Chen, Xiao Xu et al.
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Chaoyou Fu, Haojia Lin, Xiong Wang et al.
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
Zuwei Long, Yunhang Shen, Chaoyou Fu et al.
VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction-Editing Data and Long Captions
Ziteng Wang, Siqi Yang, Limeng Qiao et al.
VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao, Yiyang Gan, Bairui Wang et al.
VividFace: A Robost and High-Fidelity Video Face Swapping Framework
Hao Shao, Shulun Wang, Yang Zhou et al.
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
Siyu Xu, Yunke Wang, Chenghao Xia et al.
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
Chongkai Gao, Zixuan Liu, Zhenghao Chi et al.
VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models
Xinan He, Yue Zhou, Bing Fan et al.
VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking
Kichang Yang, Seonjun Kim, Minjae Kim et al.
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Maonan Wang, Yirong Chen, Aoyu Pang et al.
VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Chaoya Jiang, Yongrui Heng, Wei Ye et al.
VLMs can Aggregate Scattered Training Patches
Zhanhui Zhou, Lingjie Chen, Chao Yang et al.
VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
Shmuel Berman, Jia Deng