Oral Papers

1,594 papers found • Page 25 of 32

Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models

Siwei Zhang, Yun Xiong, Yateng Tang et al.

NeurIPS 2025oral

UniMotion: A Unified Motion Framework for Simulation, Prediction and Planning

Nan Song, Junzhe Jiang, jingyu li et al.

NeurIPS 2025oral

UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting

Kai He, Ruofan Liang, Jacob Munkberg et al.

NeurIPS 2025oral

UniTransfer: Video Concept Transfer via Progressive Spatio-Temporal Decomposition

guojun lei, Rong Zhang, Tianhang Liu et al.

NeurIPS 2025oral

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models

Zeqian Li, Shangzhe Di, Zhonghua Zhai et al.

NeurIPS 2025oralarXiv:2506.18883
9
citations

UniViT: Unifying Image and Video Understanding in One Vision Encoder

feilong tang, xiangan, Haolin Yang et al.

NeurIPS 2025oral

Unleashing Hour-Scale Video Training for Long Video-Language Understanding

Jingyang Lin, Jialian Wu, Ximeng Sun et al.

NeurIPS 2025oral

Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding

Zaiquan Yang, Yuhao LIU, Gerhard Hancke et al.

NeurIPS 2025oralarXiv:2509.15178
2
citations

Unlocking Point Processes through Point Set Diffusion

David Lüdke, Enric Rabasseda Raventós, Marcel Kollovieh et al.

ICLR 2025oralarXiv:2410.22493
5
citations

Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks

Jieyuan (Eric) Zhang, Xiaolong Zhou, Shuai Wang et al.

NeurIPS 2025oral

UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception

Karthikeyan Chandra Sekaran, Markus Geisler, Dominik Rößle et al.

NeurIPS 2025oralarXiv:2510.23478
1
citations

V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation

Hanyue Lou, Jinxiu Liang, Minggui Teng et al.

NeurIPS 2025oralarXiv:2505.16797
2
citations

VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations

Qianqian Qiao, DanDan Zheng, Yihang Bo et al.

NeurIPS 2025oral

VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree

Wenlong Li, Yifei Xu, Yuan Rao et al.

NeurIPS 2025oralarXiv:2510.22693
1
citations

Variance-Reduced Long-Term Rehearsal Learning with Quadratic Programming Reformulation

Wen-Bo Du, Tian Qin, Tian-Zuo Wang et al.

NeurIPS 2025oral

Variational Counterfactual Intervention Planning to Achieve Target Outcomes

Xin Wang, Shengfei Lyu, Luo Chi et al.

ICML 2025oral
2
citations

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin et al.

ICLR 2025oralarXiv:2407.12781
114
citations

Vector Quantization in the Brain: Grid-like Codes in World Models

Xiangyuan Peng, Xingsi Dong, Si Wu

NeurIPS 2025oral

VerbalTS: Generating Time Series from Texts

Shuqi Gu, Chuyue Li, Baoyu Jing et al.

ICML 2025oral

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

Thomas Zeng, Shuibai Zhang, Shutong Wu et al.

ICML 2025oral

VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers

Qinkai XU, yijin liu, YangChen et al.

NeurIPS 2025oral

VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption

Tianxiong Zhong, Xingye Tian, Boyuan Jiang et al.

NeurIPS 2025oralarXiv:2505.12053
3
citations

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding

Xiaoqian Shen, Wenxuan Zhang, Jun Chen et al.

NeurIPS 2025oralarXiv:2510.14032
6
citations

ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler

Serin Yang, Taesung Kwon, Jong Chul YE

ICLR 2025oral
9
citations

ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs

Michal Nazarczuk, Sibi Catley-Chandar, Thomas Tanay et al.

NeurIPS 2025oral

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Ziang Yan, Yinan He, Xinhao Li et al.

NeurIPS 2025oralarXiv:2509.21100
13
citations

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Boqing Gong, Yin Cui, Long Zhao et al.

ICLR 2025oral

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Hila Chefer, Uriel Singer, Amit Zohar et al.

ICML 2025oral

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Jialong Zuo, Yongtai Deng, Lingdong Kong et al.

NeurIPS 2025oralarXiv:2510.12422
2
citations

VideoMAR: Autoregressive Video Generation with Continuous Tokens

Hu Yu, Biao Gong, Hangjie Yuan et al.

NeurIPS 2025oral

Video-R1: Reinforcing Video Reasoning in MLLMs

Kaituo Feng, Kaixiong Gong, Bohao Li et al.

NeurIPS 2025oralarXiv:2503.21776
236
citations

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang et al.

NeurIPS 2025oral

VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning

Qi Wang, Yanrui Yu, Ye Yuan et al.

NeurIPS 2025oral

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Xilin Wei, Xiaoran Liu, Yuhang Zang et al.

ICML 2025oral

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

Xuannan Liu, Zekun Li, Zheqi He et al.

NeurIPS 2025oral
7
citations

VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking

Runyi Hu, Jie Zhang, Yiming Li et al.

ICLR 2025oral

Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po et al.

NeurIPS 2025oral

Vid-SME: Membership Inference Attacks against Large Video Understanding Models

Qi Li, Runpeng Yu, Xinchao Wang

NeurIPS 2025oralarXiv:2506.03179
5
citations

Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image

Junkun Chen, Aayush Bansal, Minh Vo et al.

NeurIPS 2025oral

ViSAGe: Video-to-Spatial Audio Generation

Jaeyeon Kim, Heeseung Yun, Gunhee Kim

ICLR 2025oral

Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It

Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindström et al.

NeurIPS 2025oralarXiv:2507.13328

Vision Language Models are In-Context Value Learners

Yecheng Jason Ma, Joey Hejna, Chuyuan Fu et al.

ICLR 2025oral

Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models

Zahra Babaiee, Peyman M. Kiasari, Daniela Rus et al.

ICML 2025oral
1
citations

VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution

Limeng Qiao, Yiyang Gan, Bairui Wang et al.

NeurIPS 2025oral

VividFace: A Robost and High-Fidelity Video Face Swapping Framework

Hao Shao, Shulun Wang, Yang Zhou et al.

NeurIPS 2025oral

VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching

Siyu Xu, Yunke Wang, Chenghao Xia et al.

NeurIPS 2025oralarXiv:2502.02175
27
citations

VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models

Haichao Zhang, Yun Fu

NeurIPS 2025oralarXiv:2503.16980
3
citations

VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting

Hoonhee Cho, Jae-Young Kang, Giwon Lee et al.

NeurIPS 2025oral

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Yumeng Li, William H Beluch, Margret Keuper et al.

ICLR 2025oral
10
citations

VVC-Gym: A Fixed-Wing UAV Reinforcement Learning Environment for Multi-Goal Long-Horizon Problems

Xudong Gong, Feng Dawei, Kele Xu et al.

ICLR 2025oral