NeurIPS Papers

5,858 papers found • Page 114 of 118

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

Xuannan Liu, Zekun Li, Zheqi He et al.

NeurIPS 2025oral
7
citations

Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations

Xin Liu, Haoran Li, Dongbin Zhao

NeurIPS 2025posterarXiv:2512.21586

VideoTitans: Scalable Video Prediction with Integrated Short- and Long-term Memory

Young-Jae Park, Minseok Seo, Hae-Gon Jeon

NeurIPS 2025poster

VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation

Wenhao Wang, Yi Yang

NeurIPS 2025posterarXiv:2503.01739
10
citations

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

Yichao Shen, Fangyun Wei, Zhiying Du et al.

NeurIPS 2025posterarXiv:2512.06963
3
citations

Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po et al.

NeurIPS 2025oralarXiv:2506.05284

Vid-SME: Membership Inference Attacks against Large Video Understanding Models

Qi Li, Runpeng Yu, Xinchao Wang

NeurIPS 2025oralarXiv:2506.03179
5
citations

ViewCraft3D: High-fidelity and View-Consistent 3D Vector Graphics Synthesis

Chuang Wang, Haitao Zhou, Ling Luo et al.

NeurIPS 2025posterarXiv:2505.19492
1
citations

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

Zixun Fang, Kai Zhu, Zhiheng Liu et al.

NeurIPS 2025posterarXiv:2506.23513

VIKING: Deep variational inference with stochastic projections

Samuel Matthiesen, Hrittik Roy, Nicholas Krämer et al.

NeurIPS 2025posterarXiv:2510.23684

VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang, Xiufeng Song, Heng Zhou et al.

NeurIPS 2025posterarXiv:2506.09049
8
citations

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models

Haidong Xu, Guangwei Xu, Zhedong Zheng et al.

NeurIPS 2025posterarXiv:2508.12081
1
citations

Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning

wang lin, Wentao Hu, Liyu Jia et al.

NeurIPS 2025poster

VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion

Jaekyun Park, Hye Won Chung

NeurIPS 2025poster

Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image

Junkun Chen, Aayush Bansal, Minh Vo et al.

NeurIPS 2025oral

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

Zi Liang, Qingqing Ye, Xuan Liu et al.

NeurIPS 2025spotlight

VisDiff: SDF-Guided Polygon Generation for Visibility Reconstruction, Characterization and Recognition

Rahul Moorthy Mahesh, Jun-Jee Chao, Volkan Isler

NeurIPS 2025poster
2
citations

Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It

Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindström et al.

NeurIPS 2025oralarXiv:2507.13328

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chen, Yang Zhao et al.

NeurIPS 2025poster

Vision-centric Token Compression in Large Language Model

Ling Xing, Alex Jinpeng Wang, Rui Yan et al.

NeurIPS 2025spotlightarXiv:2502.00791
7
citations

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation

Zheng Anlin, Xin Wen, Xuanyang Zhang et al.

NeurIPS 2025poster
8
citations

Vision Function Layer in Multimodal LLMs

Cheng Shi, Yizhou Yu, Sibei Yang

NeurIPS 2025posterarXiv:2509.24791
3
citations

Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models

Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.

NeurIPS 2025posterarXiv:2507.07104
2
citations

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Senqiao Yang, Junyi Li, Xin Lai et al.

NeurIPS 2025poster

Vision Transformers Don't Need Trained Registers

Nicholas Jiang, Amil Dravid, Alexei Efros et al.

NeurIPS 2025spotlightarXiv:2506.08010
12
citations

Vision Transformers with Self-Distilled Registers

Zipeng Yan, Yinjie Chen, Chong Zhou et al.

NeurIPS 2025spotlightarXiv:2505.21501
4
citations

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding

Jialiang Kang, Han Shu, Wenshuo Li et al.

NeurIPS 2025posterarXiv:2509.15235
2
citations

ViSPLA: Visual Iterative Self-Prompting for Language-Guided 3D Affordance Learning

Hritam Basak, Zhaozheng Yin

NeurIPS 2025poster

Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models

Fenil Doshi, Thomas Fel, Talia Konkle et al.

NeurIPS 2025poster

Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection

Chanhyeong Yang, Taehoon song, Jihwan Park et al.

NeurIPS 2025poster

Visual Instruction Bottleneck Tuning

Changdae Oh, Jiatong Li, Shawn Im et al.

NeurIPS 2025posterarXiv:2505.13946
2
citations

Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting

Anand Bhattad, Konpat Preechakul, Alexei Efros

NeurIPS 2025posterarXiv:2503.21770
8
citations

VisualLens: Personalization through Task-Agnostic Visual History

Wang Bill Zhu, Deqing Fu, Kai Sun et al.

NeurIPS 2025posterarXiv:2411.16034

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

Tianhe Wu, Jian Zou, Jie Liang et al.

NeurIPS 2025spotlightarXiv:2505.14460
30
citations

Visual Structures Help Visual Reasoning: Addressing the Binding Problem in LVLMs

Amirmohammad Izadi, Mohammadali Banayeeanzade, Fatemeh Askari et al.

NeurIPS 2025poster
1
citations

Visual Sync: Multi‑Camera Synchronization via Cross‑View Object Motion

Shaowei Liu, David Yao, Saurabh Gupta et al.

NeurIPS 2025poster

Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought

ZIhui Cheng, Qiguang Chen, Xiao Xu et al.

NeurIPS 2025poster

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Chaoyou Fu, Haojia Lin, Xiong Wang et al.

NeurIPS 2025spotlightarXiv:2501.01957
130
citations

VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model

Zuwei Long, Yunhang Shen, Chaoyou Fu et al.

NeurIPS 2025poster
17
citations

VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction-Editing Data and Long Captions

Ziteng Wang, Siqi Yang, Limeng Qiao et al.

NeurIPS 2025poster

VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution

Limeng Qiao, Yiyang Gan, Bairui Wang et al.

NeurIPS 2025oral

VividFace: A Robost and High-Fidelity Video Face Swapping Framework

Hao Shao, Shulun Wang, Yang Zhou et al.

NeurIPS 2025oral

VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching

Siyu Xu, Yunke Wang, Chenghao Xia et al.

NeurIPS 2025oralarXiv:2502.02175
27
citations

VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models

Chongkai Gao, Zixuan Liu, Zhenghao Chi et al.

NeurIPS 2025posterarXiv:2506.17561
8
citations

VLForgery Face Triad: Detection, Localization and Attribution via Multimodal Large Language Models

Xinan He, Yue Zhou, Bing Fan et al.

NeurIPS 2025poster

VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking

Kichang Yang, Seonjun Kim, Minjae Kim et al.

NeurIPS 2025poster

VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture

Maonan Wang, Yirong Chen, Aoyu Pang et al.

NeurIPS 2025poster

VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Chaoya Jiang, Yongrui Heng, Wei Ye et al.

NeurIPS 2025poster
18
citations

VLMs can Aggregate Scattered Training Patches

Zhanhui Zhou, Lingjie Chen, Chao Yang et al.

NeurIPS 2025posterarXiv:2506.03614

VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs

Shmuel Berman, Jia Deng

NeurIPS 2025spotlight