Highlight Papers
975 papers found • Page 13 of 20
Unified Multimodal Understanding via Byte-Pair Visual Encoding
Wanpeng Zhang, Yicheng Feng, Hao Luo et al.
Unified Reconstruction of Static and Dynamic Scenes from Events
Qiyao Gao, Peiqi Duan, Hanyue Lou et al.
UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control
Yan Wu, Korrawe Karunratanakul, Zhengyi Luo et al.
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Yiheng Li, RuiBing Hou, Hong Chang et al.
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Xi Chen, Zhifei Zhang, He Zhang et al.
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.
Universal Scene Graph Generation
Shengqiong Wu, Hao Fei, Tat-seng Chua
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Bolin Lai, Felix Juefei-Xu, Miao Liu et al.
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai, Zhao Yunfei, Zibo Zhao et al.
Unlocking Generalization Power in LiDAR Point Cloud Registration
Zhenxuan Zeng, Qiao Wu, Xiyu Zhang et al.
UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
Fangwei Zhong, Kui Wu, Churan Wang et al.
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling
Haopeng Sun, Yingwei Zhang, Lumin Xu et al.
Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras
Shuang Guo, Friedhelm Hamann, Guillermo Gallego
Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach
Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li et al.
UnZipLoRA: Separating Content and Style from a Single Image
Chang Liu, Viraj Shah, Aiyu Cui et al.
USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting
Kang Chen, Jiyuan Zhang, Zecheng Hao et al.
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
Chang-Bin Zhang, Jinhong Ni, Yujie Zhong et al.
VEU-Bench: Towards Comprehensive Understanding of Video Editing
Bozheng Li, Yongliang Wu, YI LU et al.
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Sili Chen, Hengkai Guo, Shengnan Zhu et al.
Video Individual Counting for Moving Drones
Yaowu Fan, Jia Wan, Tao Han et al.
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu, Yuhan Dai, Yongdong Luo et al.
Video Motion Graphs
Haiyang Liu, Zhan Xu, Fating Hong et al.
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Hanyang Wang, Fangfu Liu, Jiawei Chi et al.
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Vishwesh Nath, Wenqi Li, Dong Yang et al.
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng, Kyle Genova, Songyou Peng et al.
Visual Representation Learning through Causal Intervention for Controllable Image Editing
Shanshan Huang, Haoxuan Li, Chunyuan Zheng et al.
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li, wei yuancheng, Zhihui Xie et al.
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
Runjia Li, Philip Torr, Andrea Vedaldi et al.
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution
ZELIN LI, Chenwei Wang, Zhaoke Huang et al.
Volumetrically Consistent 3D Gaussian Rasterization
Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi et al.
VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions
Marko Mihajlovic, Siwei Zhang, Gen Li et al.
VRM: Knowledge Distillation via Virtual Relation Matching
Weijia Zhang, Fei Xie, Weidong Cai et al.
Wasserstein Style Distribution Analysis and Transform for Stylized Image Generation
Xi Yu, Xiang Gu, Zhihao Shi et al.
WeaveSeg: Iterative Contrast-weaving and Spectral Feature-refining for Nuclei Instance Segmentation
Jiajia Li, Huisi Wu, Jing Qin
What to Distill? Fast Knowledge Distillation with Adaptive Sampling
Byungchul Chae, Seonyeong Heo
When Confidence Fails: Revisiting Pseudo-Label Selection in Semi-supervised Semantic Segmentation
Pan Liu, Jinshi Liu
Where, What, Why: Towards Explainable Driver Attention Prediction
Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao et al.
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
WINS: Winograd Structured Pruning for Fast Winograd Convolution
Cheonjun Park, Hyunjae Oh, Mincheol Park et al.
WISH: Weakly Supervised Instance Segmentation using Heterogeneous Labels
Hyeokjun Kweon, Kuk-Jin Yoon
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li, Hong-Xing Yu, Wei Liu et al.
WonderWorld: Interactive 3D Scene Generation from a Single Image
Hong-Xing Yu, Haoyi Duan, Charles Herrmann et al.
World-consistent Video Diffusion with Explicit 3D Modeling
Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.
X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen, Hongyi Xu, Guoxian Song et al.
X-Dyna: Expressive Dynamic Human Image Animation
Di Chang, Hongyi Xu, You Xie et al.
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang, hongzhen wang, Zonghao Guo et al.
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
seil kang, Jinyeong Kim, Junhyeok Kim et al.
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
Baorui Ma, Huachen Gao, Haoge Deng et al.