Most Cited ICCV "local geometry representation" Papers
2,701 papers found • Page 13 of 14
Conference
ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking
Xiaokun Feng, Shiyu Hu, Xuchen Li et al.
Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling
Zenghao Niu, Weicheng Xie, Siyang Song et al.
CWNet: Causal Wavelet Network for Low-Light Image Enhancement
Tongshun Zhang, Pingping Liu, Yubing Lu et al.
SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation
Shiqi Huang, Shuting He, Huaiyuan Qin et al.
Federated Representation Angle Learning
Liping Yi, Han Yu, Gang Wang et al.
GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization
Shaowen Tong, Zimin Xia, Alexandre Alahi et al.
Diffusion-based Source-biased Model for Single Domain Generalized Object Detection
Han Jiang, Wenfei Yang, Tianzhu Zhang et al.
Measuring the Impact of Rotation Equivariance on Aerial Object Detection
Xiuyu Wu, Xinhao Wang, Xiubin Zhu et al.
Flow Stochastic Segmentation Networks
Fabio De Sousa Ribeiro, Omar Todd, Charles Jones et al.
From Gaze to Movement: Predicting Visual Attention for Autonomous Driving Human-Machine Interaction based on Programmatic Imitation Learning
Yexin Huang, Yongbin Lin, Lishengsa Yue et al.
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Predictions
Dubing Chen, Jin Fang, Wencheng Han et al.
From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning
Sen Wang, Shao Zeng, Tianjun Gu et al.
G2D: Boosting Multimodal Learning with Gradient-Guided Distillation
Mohammed Rakib, Arunkumar Bagavathi
Unified Video Generation via Next-Set Prediction in Continuous Domain
Zhanzhou Feng, Qingpei Guo, Xinyu Xiao et al.
LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
Feihong Yan, qingyan wei, Jiayi Tang et al.
Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models
Hongyang Wei, Shuaizheng Liu, Chun Yuan et al.
Feature Decomposition-Recomposition in Large Vision-Language Model for Few-Shot Class-Incremental Learning
Zongyao Xue, Meina Kan, Shiguang Shan et al.
Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing
Maria-Paola Forte, Nikos Athanasiou, Giulia Ballardini et al.
Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning
yan wang, Da-Wei Zhou, Han-Jia Ye
Can Knowledge be Transferred from Unimodal to Multimodal? Investigating the Transitivity of Multimodal Knowledge Editing
Lingyong Fang, Xinzhong Wang, Depeng depeng wang et al.
ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints
Debasmit Das, Hyoungwoo Park, Munawar Hayat et al.
UDC-VIT: A Real-World Video Dataset for Under-Display Cameras
Kyusu Ahn, JiSoo Kim, Sangik Lee et al.
Is Visual in-Context Learning for Compositional Medical Tasks within Reach?
Simon Reiß, Zdravko Marinov, Alexander Jaus et al.
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao, Wang Lu, Jie Ji et al.
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng, Mingsheng Li, Jiakang Yuan et al.
Enhanced Event-based Dense Stereo via Cross-Sensor Knowledge Distillation
Haihao Zhang, Yunjian Zhang, Jianing Li et al.
Not Only Vision: Evolve Visual Speech Recognition via Peripheral Information
Zhaoxin Yuan, Shuang Yang, Shiguang Shan et al.
ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation
Jimyeong Kim, Jungwon Park, Yeji Song et al.
Imbalance in Balance: Online Concept Balancing in Generation Models
Yukai Shi, Jiarong Ou, Rui Chen et al.
RALoc: Enhancing Outdoor LiDAR Localization via Rotation Awareness
Yuyang Yang, Wen Li, Sheng Ao et al.
Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology
Siyuan Yan, Ming Hu, Yiwen Jiang et al.
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
Hengjia Li, Lifan Jiang, Xi Xiao et al.
Visual Interestingness Decoded: How GPT-4o Mirrors Human Interests
Fitim Abdullahu, Helmut Grabner
D-Attn: Decomposed Attention for Large Vision-and-Language Model
Chia-Wen Kuo, Sijie Zhu, Fan Chen et al.
Understanding Personal Concept in Open-Vocabulary Semantic Segmentation
Sunghyun Park, Jungsoo Lee, Shubhankar Borse et al.
CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
Rui Song, Chenwei Liang, Yan Xia et al.
UnZipLoRA: Separating Content and Style from a Single Image
Chang Liu, Viraj Shah, Aiyu Cui et al.
SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures
Yi Qin, Rui Wang, Tao Huang et al.
Semi-supervised Concept Bottleneck Models
Lijie Hu, Tianhao Huang, Huanyi Xie et al.
WINS: Winograd Structured Pruning for Fast Winograd Convolution
Cheonjun Park, Hyunjae Oh, Mincheol Park et al.
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Nairouz Mrabah, Nicolas Richet, Ismail Ayed et al.
ART: Adaptive Relation Tuning for Generalized Relation Prediction
Gopika Sudhakaran, Hikaru Shindo, Patrick Schramowski et al.
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Aleksandar Jevtić, Christoph Reich, Felix Wimbauer et al.
No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views
Ranran Huang, Krystian Mikolajczyk
Cooperative Pseudo Labeling for Unsupervised Federated Classification
Kuangpu Guo, Lijun Sheng, Yongcan Yu et al.
MemDistill: Distilling LiDAR Knowledge into Memory for Camera-Only 3D Object Detection
Donghyeon Kwon, Youngseok Yoon, Hyeongseok Son et al.
From Sharp to Blur: Unsupervised Domain Adaptation for 2D Human Pose Estimation Under Extreme Motion Blur Using Event Cameras
Youngho Kim, Hoonhee Cho, Kuk-Jin Yoon
Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning
Zhengxuan Wei, Jiajin Tang, Sibei Yang
PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening
Jeonghyeok Do, Sungpyo Kim, Geunhyuk Youk et al.
Differentially Private Fine-Tuning of Diffusion Models
Yu-Lin Tsai, Yizhe Li, Zekai Chen et al.
IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark
Zhe Cao, Jin Zhang, Ruiheng Zhang
One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models
Jiale Zhao, XINYANG JIANG, Junyao Gao et al.
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
Xu Zheng, Yuanhuiyi Lyu, Lutao Jiang et al.
PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization
Bing Fan, Yunhe Feng, Yapeng Tian et al.
Language-Driven Multi-Label Zero-Shot Learning with Semantic Granularity
Shouwen Wang, Qian Wan, Junbin Gao et al.
IM360: Large-scale Indoor Mapping with 360 Cameras
Dongki Jung, Jaehoon Choi, Yonghan Lee et al.
PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion
Gwanghyun Kim, Suh Jeon Jeon, Seunggyu Lee et al.
MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval
Jaeseok Byun, Young Kyun Jang, Seokhyeon Jeong et al.
Adaptive Learning of High-Value Regions for Semi-Supervised Medical Image Segmentation
Tao Lei, Ziyao Yang, Xingwu wang et al.
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
Xinyao Liu, Diping Song
Integrating Biological Knowledge for Robust Microscopy Image Profiling on De Novo Cell Lines
Jiayuan Chen, Thai-Hoang Pham, Yuanlong Wang et al.
Spectral Sensitivity Estimation with an Uncalibrated Diffraction Grating
Lilika Makabe, Hiroaki Santo, Fumio Okura et al.
TransiT: Transient Transformer for Non-line-of-sight Videography
Ruiqian Li, Siyuan Shen, Suan Xia et al.
On the Complexity-Faithfulness Trade-off of Gradient-Based Explanations
Amir Mehrpanah, Matteo Gamba, Kevin Smith et al.
FedDifRC: Unlocking the Potential of Text-to-Image Diffusion Models in Heterogeneous Federated Learning
Huan Wang, Haoran Li, Huaming Chen et al.
Category-Specific Selective Feature Enhancement for Long-Tailed Multi-Label Image Classification
Ruiqi Du, Xu Tang, Xiangrong Zhang et al.
Registration beyond Points: General Affine Subspace Alignment via Geodesic Distance on Grassmann Manifold
Jaeho Shin, Hyeonjae Gil, Junwoo Jang et al.
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval
Jaeseok Byun, Seokhyeon Jeong, Wonjae Kim et al.
Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning
Wenjin Mo, Zhiyuan Li, Minghong Fang et al.
To Label or Not to Label: PALM – A Predictive Model for Evaluating Sample Efficiency in Active Learning Models
Julia Machnio, Mads Nielsen, Mostafa Mehdipour Ghazi
Personalized Federated Learning under Local Supervision
Qiqi Liu, Jiaqiang Li, Yuchen Liu et al.
Radiant Foam: Real-Time Differentiable Ray Tracing
Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi et al.
COSTARR: Consolidated Open Set Technique with Attenuation for Robust Recognition
Ryan Rabinowitz, Steve Cruz, Walter Scheirer et al.
Information Density Principle for MLLM Benchmarks
Chunyi Li, Xiaozhe Li, Zicheng Zhang et al.
Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation
Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu et al.
Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Yunchuan Guan, Yu Liu, Ke Zhou et al.
Long-Tailed Classification with Multi-Granularity Semantics
Yuting Liu, Liu Yang, Yu Wang
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Shaofeng Yin, Ting Lei, Yang Liu
FEVER-OOD: Free Energy Vulnerability Elimination for Robust Out-of-Distribution Detection
Brian Isaac-Medina, Mauricio Che, Yona Falinie A. Gaus et al.
Adversarial Purification via Super-Resolution and Diffusion
Mincheol Park, Cheonjun Park, Seungseop Lim et al.
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
Xianfu Cheng, Wei Zhang, Shiwei Zhang et al.
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai, Pengfei Zhou, xu Pan et al.
Failure Cases Are Better Learned But Boundary Says Sorry: Facilitating Smooth Perception Change for Accuracy-Robustness Trade-Off in Adversarial Training
Yanyun Wang, Li Liu
Secure On-Device Video OOD Detection Without Backpropagation
Li Li, Peilin Cai, Yuxiao Zhou et al.
Learning Counterfactually Decoupled Attention for Open-World Model Attribution
Yu Zheng, Boyang Gong, Fanye Kong et al.
Latte: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning
Wenxuan Bao, Ruxi Deng, Ruizhong Qiu et al.
Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation
Zixin Wang, Dong Gong, Sen Wang et al.
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Qifan Yu, Zhebei Shen, Zhongqi Yue et al.
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations
Chongjie Si, Zhiyi Shi, Xuehui Wang et al.
Partial Forward Blocking: A Novel Data Pruning Paradigm for Lossless Training Acceleration
Dongyue Wu, Zilin Guo, Jialong Zuo et al.
CIARD: Cyclic Iterative Adversarial Robustness Distillation
Liming Lu, Shuchao Pang, Xu Zheng et al.
InfoBridge: Balanced Multimodal Integration through Conditional Dependency Modeling
Chenxin Li, Yifan Liu, Panwang Pan et al.
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning
Zhengzhuo Xu, Sinan Du, Yiyan Qi et al.
DiffRefine: Diffusion-based Proposal Specific Point Cloud Densification for Cross-Domain Object Detection
Sangyun Shin, Yuhang He, Xinyu Hou et al.
Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features
Shangbo Wu, Yu-an Tan, Ruinan Ma et al.
Divide-and-Conquer for Enhancing Unlabeled Learning, Stability, and Plasticity in Semi-supervised Continual Learning
Yue Duan, Taicai Chen, Lei Qi et al.
Confound from All Sides, Distill with Resilience: Multi-Objective Adversarial Paths to Zero-Shot Robustness
Junhao Dong, Jiao Liu, Xinghua Qu et al.
Mitigating Object Hallucinations via Sentence-Level Early Intervention
Shangpin Peng, Senqiao Yang, Li Jiang et al.
Open-Unfairness Adversarial Mitigation for Generalized Deepfake Detection
Zhaoyang Li, Zhu Teng, Baopeng Zhang et al.
Spatial Preference Rewarding for MLLMs Spatial Understanding
Han Qiu, Peng Gao, Lewei Lu et al.
Structured Policy Optimization: Enhance Large Vision-Language Model via Self-referenced Dialogue
Guohao Sun, Can Qin, Yihao Feng et al.
A Framework for Double-Blind Federated Adaptation of Foundation Models
Nurbek Tastan, Karthik Nandakumar
MMOne: Representing Multiple Modalities in One Scene
Zhifeng Gu, Bing WANG
VisionMath: Vision-Form Mathematical Problem-Solving
Zongyang Ma, Yuxin Chen, Ziqi Zhang et al.
Quanta Neural Networks: From Photons to Perception
Varun Sundar, Tianyi Zhang, Sacha Jungerman et al.
OpenSubstance: A High-quality Measured Dataset of Multi-View and -Lighting Images and Shapes
Fan Pei, jinchen bai, Xiang Feng et al.
VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding
Zhu Yihang, Jinhao Zhang, Yuxuan Wang et al.
RMultiplex200K: Toward Reliable Multimodal Process Supervision for Visual Language Models on Telecommunications
Sijia Chen, Bin Song
EFTViT: Efficient Federated Training of Vision Transformers with Masked Images on Resource-Constrained Clients
meihan wu, Tao Chang, Cui Miao et al.
Target Bias Is All You Need: Zero-Shot Debiasing of Vision-Language Models with Bias Corpus
Taeuk Jang, Hoin Jung, Xiaoqian Wang
Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models
Xinyu Chen, Haotian Zhai, Can Zhang et al.
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao, Beier Zhu, Qianru Sun et al.
TRNAS: A Training-Free Robust Neural Architecture Search
Yeming Yang, Qingling Zhu, Jianping Luo et al.
The Inter-Intra Modal Measure: A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models
Laura Niss, Kevin Vogt-Lowell, Theodoros Tsiligkaridis
What to Distill? Fast Knowledge Distillation with Adaptive Sampling
Byungchul Chae, Seonyeong Heo
Generative Modeling of Shape-Dependent Self-Contact Human Poses
Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito et al.
Met2Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems
Shaohan Li, Hao Yang, Min Chen et al.
Beyond RGB: Adaptive Parallel Processing for RAW Object Detection
Shani Gamrian, Hila Barel, Feiran Li et al.
PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data
CHANGHEE YANG, Hyeonseop Song, Seokhun Choi et al.
TorchAdapt: Towards Light-Agnostic Real-Time Visual Perception
Khurram Azeem Hashmi, Karthik Suresh, Didier Stricker et al.
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Christopher Xie, Armen Avetisyan, Henry Howard-Jenkins et al.
DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
Qingcheng Zhao, Xiang Zhang, Haiyang Xu et al.
Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design
Yuhao Sun, Yihua Zhang, Gaowen Liu et al.
Real3D: Towards Scaling Large Reconstruction Models with Real Images
Hanwen Jiang, Qixing Huang, Georgios Pavlakos
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Olaf Dünkel, Thomas Wimmer, Christian Theobalt et al.
CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy
Dongyoung Kim, Mahmoud Afifi, Dongyun Kim et al.
Zero-shot Inexact CAD Model Alignment from a Single Image
Pattaramanee Arsomngern, Sasikarn Khwanmuang, Matthias Nießner et al.
Motal: Unsupervised 3D Object Detection by Modality and Task-specific Knowledge Transfer
Hai Wu, Hongwei Lin, Xusheng Guo et al.
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
Pingrui Zhang, Xianqiang Gao, Yuhan Wu et al.
OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection
Heng Su, Mengying Xie, Nieqing Cao et al.
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke, Suzannah Wistreich, Yanjie Ze et al.
GloPER: Unsupervised Animal Pattern Extraction from Local Reconstruction
Bowen Chen, Yun Sing Koh, Gillian Dobbie
Focal Plane Visual Feature Generation and Matching on a Pixel Processor Array
Hongyi Zhang, Laurie Bose, Jianing Chen et al.
Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation
Hongyu Wen, Yiming Zuo, Venkat Subramanian et al.
AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning
Dejie Yang, Zijing Zhao, Yang Liu
Unleashing the Temporal Potential of Stereo Event Cameras for Continuous-Time 3D Object Detection
Jae Young Kang, Hoonhee Cho, Kuk-Jin Yoon
PlaneRAS: Learning Planar Primitives for 3D Plane Recovery
Fang Zhang, Wenzhao Zheng, Linqing Zhao et al.
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark
Wufei Ma, Haoyu Chen, Guofeng Zhang et al.
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang, Yutong Liu, Yangguang Li et al.
Layer-wise Vision Injection with Disentangled Attention for Efficient LVLMs
Xuange Zhang, Dengjie Li, Bo Liu et al.
HccePose (BF): Predicting Front & Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation
Yulin Wang, Mengting Hu, Hongli Li et al.
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger, Nina Wenzel, David Griffiths et al.
Understanding Flatness in Generative Models: Its Role and Benefits
Taehwan Lee, Kyeongkook Seo, Jaejun Yoo et al.
Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints
Dinh-Vinh-Thuy Tran, Ruochen Chen, Shaifali Parashar
PHD: Personalized 3D Human Body Fitting with Point Diffusion
Hsuan-I Ho, Chen Guo, Po-Chen Wu et al.
ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
AO LI, Jinpeng Liu, Yixuan Zhu et al.
MonoSOWA: Scalable monocular 3D Object detector Without human Annotations
Jan Skvrna, Lukas Neumann
Estimating 2D Camera Motion with Hybrid Motion Basis
Haipeng Li, Tianhao Zhou, Zhanglei Yang et al.
TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras
Mohammad Mohammadi, Ziyi Wu, Igor Gilitschenski
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
Xiao Fang, Minhyek Jeon, Zheyang Qin et al.
Revisiting Image Fusion for Multi-Illuminant White-Balance Correction
David Serrano, Aditya Arora, Luis Herranz et al.
Uncertainty-Aware Gradient Stabilization for Small Object Detection
Huixin Sun, Yanjing Li, Linlin Yang et al.
CryoFastAR: Fast Cryo-EM Ab initio Reconstruction Made Easy
Jiakai Zhang, Shouchen Zhou, Haizhao Dai et al.
Event-guided Unified Framework for Low-light Video Enhancement, Frame Interpolation, and Deblurring
Taewoo Kim, Kuk-Jin Yoon
Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement
Qian Liang, Ruixu Geng, Jinbo Chen et al.
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Yusuke Hirota, Ryo Hachiuma, Boyi Li et al.
SEHDR: Single-Exposure HDR Novel View Synthesis via 3D Gaussian Bracketing
Yiyu Li, Haoyuan Wang, Ke Xu et al.
MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting
Shaojie Ma, Yawei Luo, Wei Yang et al.
CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector
Abhinav Kumar, Yuliang Guo, Zhihao Zhang et al.
Learning on the Go: A Meta-learning Object Navigation Model
Xiaorong Qin, Xinhang Song, Sixian Zhang et al.
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li, Hong-Xing Yu, Wei Liu et al.
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Kaixuan Jiang, Yang Liu, Weixing Chen et al.
Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models
Mateusz Michalkiewicz, Xinyue Bai, Mahsa Baktashmotlagh et al.
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
Arindam Dutta, Meng Zheng, Zhongpai Gao et al.
ReCoT: Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models
Mengxue Qu, Yibo Hu, Kunyang Han et al.
OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration
Yiming Zuo, Willow Yang, Zeyu Ma et al.
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs
Yihan Cao, Jiazhao Zhang, Zhinan Yu et al.
Bridging the Sky and Ground: Towards View-Invariant Feature Learning for Aerial-Ground Person Re-Identification
Wajahat Khalid, Bin Liu, Xulin Li et al.
WalkVLM: Aid Visually Impaired People Walking by Vision Language Model
Zhiqiang Yuan, Ting Zhang, Yeshuang Zhu et al.
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset
Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
Lixing Xiao, Shunlin Lu, Huaijin Pi et al.
Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection
Giacomo D'Amicantonio, Snehashis Majhi, Quan Kong et al.
What If: Understanding Motion Through Sparse Interactions
Stefan A. Baumann, Nick Stracke, Timy Phan et al.
Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition
Zefeng Qian, Xincheng Yao, Yifei Huang et al.
MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence
Liyuan Deng, Yunpeng Bai, Yongkang Dai et al.
Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
Md Ashiqur Rahman, Chiao-An Yang, Michael N Cheng et al.
EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Yufei Cai, Hu Han, Yuxiang Wei et al.
Deep Adaptive Unfolded Network via Spatial Morphology Stripping and Spectral Filtration for Pan-sharpening
Hebaixu Wang, Jiayi Ma
Reference-based Super-Resolution via Image-based Retrieval-Augmented Generation Diffusion
Byeonghun Lee, Hyunmin Cho, Honggyu Choi et al.
Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection
Dat NGUYEN, Marcella Astrid, Anis Kacem et al.
Multi-modal Identity Extraction
Ryan Webster, Teddy Furon
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
Peng Chen, Pi Bu, Yingyao Wang et al.
Blind Noisy Image Deblurring Using Residual Guidance Strategy
Heyan Liu, Jianing Sun, Jun Liu et al.
Drawing Developmental Trajectory from Cortical Surface Reconstruction
WENXUAN WU, ruowen qu, Zhongliang Liu et al.
Less is More: Improving Motion Diffusion Models with Sparse Keyframes
Jinseok Bae, Inwoo Hwang, Young-Yoon Lee et al.
DGTalker: Disentangled Generative Latent Space Learning for Audio-Driven Gaussian Talking Heads
Xiaoxi Liang, Yanbo Fan, Qiya Yang et al.
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
Lei-lei Li, Jianwu Fang, Junbin Xiao et al.
Riemannian-Geometric Fingerprints of Generative Models
Hae Jin Song, Laurent Itti
G-DexGrasp: Generalizable Dexterous Grasping Synthesis Via Part-Aware Prior Retrieval and Prior-Assisted Generation
Juntao Jian, Xiuping Liu, Zixuanchen Zixuanchen et al.
ISP2HRNet: Learning to Reconstruct High Resolution Image from Irregularly Sampled Pixels via Hierarchical Gradient Learning
Yuanlin Wang, Ruiqin Xiong, Rui Zhao et al.
Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene
Donggeun Lim, Jinseok Bae, Inwoo Hwang et al.
Fast Image Super-Resolution via Consistency Rectified Flow
Jiaqi Xu, Wenbo Li, Haoze Sun et al.
Event-guided HDR Reconstruction with Diffusion Priors
Yixin Yang, jiawei zhang, Yang Zhang et al.
AffordDexGrasp: Open-set Language-guided Dexterous Grasp with Generalizable-Instructive Affordance
Yilin Wei, Mu Lin, Yuhao Lin et al.
Robust Adverse Weather Removal via Spectral-based Spatial Grouping
Yuhwan Jeong, Yunseo Yang, Youngho Yoon et al.
Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image
Shuang Xu, Zixiang Zhao, Haowen Bai et al.
VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos
YUE QIU, Yanjun Sun, Takuma Yagi et al.
HADES: Human Avatar with Dynamic Explicit Hair Strands
Zhanfeng Liao, Hanzhang Tu, Cheng Peng et al.
DreamRelation: Relation-Centric Video Customization
Yujie Wei, Shiwei Zhang, Hangjie Yuan et al.
FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration
Hao Li, Xiang Chen, Jiangxin Dong et al.