Oral Papers
1,594 papers found • Page 25 of 32
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
Siwei Zhang, Yun Xiong, Yateng Tang et al.
UniMotion: A Unified Motion Framework for Simulation, Prediction and Planning
Nan Song, Junzhe Jiang, jingyu li et al.
UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting
Kai He, Ruofan Liang, Jacob Munkberg et al.
UniTransfer: Video Concept Transfer via Progressive Spatio-Temporal Decomposition
guojun lei, Rong Zhang, Tianhang Liu et al.
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li, Shangzhe Di, Zhonghua Zhai et al.
UniViT: Unifying Image and Video Understanding in One Vision Encoder
feilong tang, xiangan, Haolin Yang et al.
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
Jingyang Lin, Jialian Wu, Ximeng Sun et al.
Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding
Zaiquan Yang, Yuhao LIU, Gerhard Hancke et al.
Unlocking Point Processes through Point Set Diffusion
David Lüdke, Enric Rabasseda Raventós, Marcel Kollovieh et al.
Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks
Jieyuan (Eric) Zhang, Xiaolong Zhou, Shuai Wang et al.
UrbanIng-V2X: A Large-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception
Karthikeyan Chandra Sekaran, Markus Geisler, Dominik Rößle et al.
V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation
Hanyue Lou, Jinxiu Liang, Minggui Teng et al.
VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations
Qianqian Qiao, DanDan Zheng, Yihang Bo et al.
VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree
Wenlong Li, Yifei Xu, Yuan Rao et al.
Variance-Reduced Long-Term Rehearsal Learning with Quadratic Programming Reformulation
Wen-Bo Du, Tian Qin, Tian-Zuo Wang et al.
Variational Counterfactual Intervention Planning to Achieve Target Outcomes
Xin Wang, Shengfei Lyu, Luo Chi et al.
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin et al.
Vector Quantization in the Brain: Grid-like Codes in World Models
Xiangyuan Peng, Xingsi Dong, Si Wu
VerbalTS: Generating Time Series from Texts
Shuqi Gu, Chuyue Li, Baoyu Jing et al.
VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data
Thomas Zeng, Shuibai Zhang, Shutong Wu et al.
VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers
Qinkai XU, yijin liu, YangChen et al.
VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
Tianxiong Zhong, Xingye Tian, Boyuan Jiang et al.
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
Xiaoqian Shen, Wenxuan Zhang, Jun Chen et al.
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler
Serin Yang, Taesung Kwon, Jong Chul YE
ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs
Michal Nazarczuk, Sibi Catley-Chandar, Thomas Tanay et al.
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan, Yinan He, Xinhao Li et al.
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Boqing Gong, Yin Cui, Long Zhao et al.
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
Hila Chefer, Uriel Singer, Amit Zohar et al.
VideoLucy: Deep Memory Backtracking for Long Video Understanding
Jialong Zuo, Yongtai Deng, Lingdong Kong et al.
VideoMAR: Autoregressive Video Generation with Continuous Tokens
Hu Yu, Biao Gong, Hangjie Yuan et al.
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng, Kaixiong Gong, Bohao Li et al.
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang et al.
VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
Qi Wang, Yanrui Yu, Ye Yuan et al.
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Xilin Wei, Xiaoran Liu, Yuhang Zang et al.
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu, Zekun Li, Zheqi He et al.
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Runyi Hu, Jie Zhang, Yiming Li et al.
Video World Models with Long-term Spatial Memory
Tong Wu, Shuai Yang, Ryan Po et al.
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Qi Li, Runpeng Yu, Xinchao Wang
Virtual Fitting Room: Generating Arbitrarily Long Videos of Virtual Try-On from a Single Image
Junkun Chen, Aayush Bansal, Minh Vo et al.
ViSAGe: Video-to-Spatial Audio Generation
Jaeyeon Kim, Heeseung Yun, Gunhee Kim
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindström et al.
Vision Language Models are In-Context Value Learners
Yecheng Jason Ma, Joey Hejna, Chuyuan Fu et al.
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Zahra Babaiee, Peyman M. Kiasari, Daniela Rus et al.
VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao, Yiyang Gan, Bairui Wang et al.
VividFace: A Robost and High-Fidelity Video Face Swapping Framework
Hao Shao, Shulun Wang, Yang Zhou et al.
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
Siyu Xu, Yunke Wang, Chenghao Xia et al.
VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
Haichao Zhang, Yun Fu
VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting
Hoonhee Cho, Jae-Young Kang, Giwon Lee et al.
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li, William H Beluch, Margret Keuper et al.
VVC-Gym: A Fixed-Wing UAV Reinforcement Learning Environment for Multi-Goal Long-Horizon Problems
Xudong Gong, Feng Dawei, Kele Xu et al.