2025 Highlight Papers
651 papers found • Page 4 of 14
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Geng Li, Jinglin Xu, Yunzhen Zhao et al.
Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction
Edgar Sucar, Zihang Lai, Eldar Insafutdinov et al.
DynFaceRestore: Balancing Fidelity and Quality in Diffusion-Guided Blind Face Restoration with Dynamic Blur-Level Mapping and Guidance
Huu Phu Do, Yu-Wei Chen, Yi-Cheng Liao et al.
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal
EBS-EKF: Accurate and High Frequency Event-based Star Tracking
Albert Reed, Connor Hashemi, Dennis Melamed et al.
Edit360: 2D Image Edits to 3D Assets from Any Angle
Junchao Huang, Xinting Hu, Shaoshuai Shi et al.
EDM: Efficient Deep Feature Matching
Xi Li, Tong Rao, Cihui Pan
Efficient Input-level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation
Shengfang ZHAI, Jiajun Li, Yue Liu et al.
Efficient Motion-Aware Video MLLM
Zijia Zhao, Yuqi Huo, Tongtian Yue et al.
EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation
Md Mostafijur Rahman, Radu Marculescu
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
Yiming Zhao, Taein Kwon, Paul Streli et al.
Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis
Tim Büchner, Christoph Anders, Orlando Guntinas-Lichius et al.
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena, Tommaso Apicella, Stefano Rosa et al.
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
Yue Fan, Xiaojian Ma, Rongpeng Su et al.
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong, Jiadong Pan, Liang Li et al.
Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility
Yidi Li, Jun Xiao, Zhengda Lu et al.
Emulating Self-attention with Convolution for Efficient Image Super-Resolution
Dongheon Lee, Seokju Yun, Youngmin Ro
End-to-End HOI Reconstruction Transformer with Graph-based Encoding
Zhenrong Wang, Qi Zheng, Sihan Ma et al.
Enduring, Efficient and Robust Trajectory Prediction Attack in Autonomous Driving via Optimization-Driven Multi-Frame Perturbation Framework
Yi Yu, Weizhen Han, Libing Wu et al.
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
Jianrong Zhang, Hehe Fan, Yi Yang
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition
Junyi Wu, Yan Huang, Min Gao et al.
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick, Effrosyni Mavroudi, Yale Song et al.
Ensemble Foreground Management for Unsupervised Object Discovery
Ziling Wu, Armaghan Moemeni, Praminda Caleb-Solly
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways
Yi Liu, Hao Zhou, Benlei Cui et al.
ESC: Erasing Space Concept for Knowledge Deletion
Tae-Young Lee, Sundong Park, Minwoo Jeon et al.
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
Jongseo Lee, Kyungho Bae, Kyle Min et al.
Estimating Body and Hand Motion in an Ego‑sensed World
Brent Yi, Vickie Ye, Maya Zheng et al.
ETAP: Event-based Tracking of Any Point
Friedhelm Hamann, Daniel Gehrig, Filbert Febryanto et al.
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
Boqian Li, Zeyu Cai, Michael Black et al.
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras
Hoonhee Cho, Jae-Young Kang, Youngho Kim et al.
Evading Data Provenance in Deep Neural Networks
Hongyu Zhu, Sichu Liang, Wenwen Wang et al.
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events
Shuoyan Wei, Feng Li, Shengeng Tang et al.
Event Ellipsometer: Event-based Mueller-Matrix Video Imaging
Ryota Maeda, Yunseong Moon, Seung-Hwan Baek
Event Fields: Capturing Light Fields at High Speed, Resolution, and Dynamic Range
Ziyuan Qu, Zihao Zou, Vivek Boominathan et al.
EventPSR: Surface Normal and Reflectance Estimation from Photometric Stereo Using an Event Camera
Bohan Yu, Jin Han, Boxin Shi et al.
EventUPS: Uncalibrated Photometric Stereo Using an Event Camera
Jinxiu Liang, Bohan Yu, Siqi Yang et al.
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao, Xiaotong Li, Yufeng Cui et al.
Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation
Hao Zhu, Yan Zhu, Jiayu Xiao et al.
Explaining Human Preferences via Metrics for Structured 3D Reconstruction
Jack Langerman, Denis Rozumny, Yuzhong Huang et al.
Exploring View Consistency for Scene-Adaptive Low-Light Light Field Image Enhancement
Shuo Zhang, Chen Gao, Youfang Lin
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
Zhenyi Lu, Xiaoye Qu, Zhenyi Lu et al.
F^3OCUS - Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics
Pramit Saha, Felix Wagner, Divyanshu Mishra et al.
Fast Globally Optimal and Geometrically Consistent 3D Shape Matching
Paul Roetzer, Florian Bernard
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
Lu Liu, Huiyu Duan, Qiang Hu et al.
Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation
Shuo Jin, Siyue Yu, Bingfeng Zhang et al.
Few-shot Implicit Function Generation via Equivariance
Suizhi Huang, Xingyi Yang, Hongtao Lu et al.
Few-Shot Pattern Detection via Template Matching and Regression
Eunchan Jo, Dahyun Kang, Sanghyun Kim et al.
FIction: 4D Future Interaction Prediction from Video
Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei, Faizan Siddiqui, Jiacong Xu et al.
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.