Most Cited ICCV "structure-conditioned generation" Papers
2,701 papers found • Page 9 of 14
Conference
Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate
Qidong Huang, Xiaoyi Dong, Pan Zhang et al.
TITAN: Query-Token based Domain Adaptive Adversarial Learning
Tajamul Ashraf, Janibul Bashir
StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
Yixu Wang, Yan Teng, Yingchun Wang et al.
Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection
Subhajit Maity, Ayan Bhunia, Subhadeep Koley et al.
Partial Forward Blocking: A Novel Data Pruning Paradigm for Lossless Training Acceleration
Dongyue Wu, Zilin Guo, Jialong Zuo et al.
LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding
Amirhossein Kazerouni, Soroush Mehraban, Michael Brudno et al.
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
Qianhao Yuan, Qingyu Zhang, yanjiang liu et al.
CIARD: Cyclic Iterative Adversarial Robustness Distillation
Liming Lu, Shuchao Pang, Xu Zheng et al.
MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning
Mattia Segu, Marta Tintore Gazulla, Yongqin Xian et al.
Moderating the Generalization of Score-based Generative Model
Wan Jiang, He Wang, Xin Zhang et al.
LLM-assisted Entropy-based Adaptive Distillation for Unsupervised Fine-grained Visual Representation Learning
Jianfeng Dong, Danfeng Luo, Daizong Liu et al.
InfoBridge: Balanced Multimodal Integration through Conditional Dependency Modeling
Chenxin Li, Yifan Liu, Panwang Pan et al.
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning
Zhengzhuo Xu, Sinan Du, Yiyan Qi et al.
DiffRefine: Diffusion-based Proposal Specific Point Cloud Densification for Cross-Domain Object Detection
Sangyun Shin, Yuhang He, Xinyu Hou et al.
Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention
Jiawei Gu, Ziyue Qiao, Zechao Li
Boundary Probing for Input Privacy Protection When Using LMM Services
Xiaofei Hui, Haoxuan Qu, Ping Hu et al.
UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement
Xiao Zhang, Fei Wei, Yong Wang et al.
Dataset Distillation as Data Compression: A Rate-Utility Perspective
Youneng Bao, Yiping Liu, Zhuo Chen et al.
Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features
Shangbo Wu, Yu-an Tan, Ruinan Ma et al.
Open-set Cross Modal Generalization via Multimodal Unified Representation
Hai Huang, Yan Xia, Shulei Wang et al.
Adversarial Data Augmentation for Single Domain Generalization via Lyapunov Exponent-Guided Optimization
ZUYU ZHANG, Ning Chen, Yongshan Liu et al.
NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection
Amirhossein Ansari, Ke Wang, Pulei Xiong
Divide-and-Conquer for Enhancing Unlabeled Learning, Stability, and Plasticity in Semi-supervised Continual Learning
Yue Duan, Taicai Chen, Lei Qi et al.
A Unified Framework to BRIDGE Complete and Incomplete Deep Multi-View Clustering under Non-IID Missing Patterns
Xiaorui Jiang, Buyun He, Peng Yuan Zhou et al.
GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
Zhenghao He, Sanchit Sinha, Guangzhi Xiong et al.
Confound from All Sides, Distill with Resilience: Multi-Objective Adversarial Paths to Zero-Shot Robustness
Junhao Dong, Jiao Liu, Xinghua Qu et al.
Mitigating Object Hallucinations via Sentence-Level Early Intervention
Shangpin Peng, Senqiao Yang, Li Jiang et al.
Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning.
Daniel DeAlcala, Aythami Morales, Julian Fierrez et al.
Open-Unfairness Adversarial Mitigation for Generalized Deepfake Detection
Zhaoyang Li, Zhu Teng, Baopeng Zhang et al.
Spatial Preference Rewarding for MLLMs Spatial Understanding
Han Qiu, Peng Gao, Lewei Lu et al.
Structured Policy Optimization: Enhance Large Vision-Language Model via Self-referenced Dialogue
Guohao Sun, Can Qin, Yihao Feng et al.
Semi-ViM: Bidirectional State Space Model for Mitigating Label Imbalance in Semi-Supervised Learning
Hongyang He, Hongyang Xie, Haochen You et al.
Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing
Maria-Paola Forte, Nikos Athanasiou, Giulia Ballardini et al.
Beyond the Limits: Overcoming Negative Correlation of Activation-Based Training-Free NAS
Haidong Kang, Lianbo Ma, Pengjun Chen et al.
Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning
yan wang, Da-Wei Zhou, Han-Jia Ye
Semi-supervised Deep Transfer for Regression without Domain Alignment
Mainak Biswas, Ambedkar Dukkipati, Devarajan Sridharan
From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
Hang Du, Jiayang Zhang, Guoshun Nan et al.
Diffusion Guided Adaptive Augmentation for Generalization in Visual Reinforcement Learning
Jeong Woon Lee, Hyoseok Hwang
A Framework for Double-Blind Federated Adaptation of Foundation Models
Nurbek Tastan, Karthik Nandakumar
EA-Vit: Efficient Adaptation for Elastic Vision Transformer
Chen Zhu, Wangbo Zhao, Huiwen Zhang et al.
Feature Coding in the Era of Large Models: Dataset, Test Conditions, and Benchmark
Changsheng Gao, Yifan Ma, Qiaoxi Chen et al.
MMOne: Representing Multiple Modalities in One Scene
Zhifeng Gu, Bing WANG
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding, Wu Shenxi, Xiangyu Zhao et al.
RainbowPrompt: Diversity-Enhanced Prompt-Evolving for Continual Learning
Kiseong Hong, Gyeong-Hyeon Kim, Eunwoo Kim
VisionMath: Vision-Form Mathematical Problem-Solving
Zongyang Ma, Yuxin Chen, Ziqi Zhang et al.
Dataset Distillation via the Wasserstein Metric
Haoyang Liu, Peiran Wang, Yijiang Li et al.
A Good Teacher Adapts Their Knowledge for Distillation
Chengyao Qian, Trung Le, Mehrtash Harandi
Quanta Neural Networks: From Photons to Perception
Varun Sundar, Tianyi Zhang, Sacha Jungerman et al.
AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
Ruifei Zhang, Junlin Xie, Wei Zhang et al.
Depth Any Event Stream: Enhancing Event-based Monocular Depth Estimation via Dense-to-Sparse Distillation
Jinjing Zhu, Tianbo Pan, Zidong Cao et al.
Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention
Weida Wang, Changyong He, Jin Zeng et al.
MPBR: Multimodal Progressive Bidirectional Reasoning for Open-Set Fine-Grained Recognition
Junfu Tan, Peiguang Jing, Yu Zhu et al.
MAVias: Mitigate any Visual Bias
Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos et al.
OpenSubstance: A High-quality Measured Dataset of Multi-View and -Lighting Images and Shapes
Fan Pei, jinchen bai, Xiang Feng et al.
VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding
Zhu Yihang, Jinhao Zhang, Yuxuan Wang et al.
AnnofreeOD: Detecting All Classes at Low Frame Rates Without Human Annotations
Boyi Sun, Yuhang Liu, Houxin He et al.
TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning
Aritra Bhowmik, Mohammad Mahdi Derakhshani, Dennis Koelma et al.
Controlling Multimodal LLMs via Reward-guided Decoding
Oscar Mañas, Pierluca D'Oro, Koustuv Sinha et al.
CE-FAM: Concept-Based Explanation via Fusion of Activation Maps
Michihiro Kuroki, Toshihiko Yamasaki
PEFTDiff: Diffusion-Guided Transferability Estimation for Parameter-Efficient Fine-Tuning
PRAFFUL KHOBA, Zijian Wang, Chetan Arora et al.
RMultiplex200K: Toward Reliable Multimodal Process Supervision for Visual Language Models on Telecommunications
Sijia Chen, Bin Song
Class-Wise Federated Averaging for Efficient Personalization
Gyuejeong Lee, Daeyoung Choi
Towards Privacy-preserved Pre-training of Remote Sensing Foundation Models with Federated Mutual-guidance Learning
Jieyi Tan, Chengwei Zhang, Bo Dang et al.
Multi-view Gaze Target Estimation
Qiaomu Miao, Vivek Golani, Jingyi Xu et al.
EFTViT: Efficient Federated Training of Vision Transformers with Masked Images on Resource-Constrained Clients
meihan wu, Tao Chang, Cui Miao et al.
Target Bias Is All You Need: Zero-Shot Debiasing of Vision-Language Models with Bias Corpus
Taeuk Jang, Hoin Jung, Xiaoqian Wang
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang, Yue Liao, RONG KANG et al.
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou et al.
Dynamic Multi-Layer Null Space Projection for Vision-Language Continual Learning
Borui Kang, Lei Wang, Zhiping Wu et al.
FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields
Junhyeog Yun, Minui Hong, Gunhee Kim
Prototype Guided Backdoor Defense via Activation Space Manipulation
Venkat Adithya Amula, Sunayana Samavedam, Saurabh Saini et al.
RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction
Johannes Künzel, Anna Hilsmann, Peter Eisert
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering
Pegah KHAYATAN, Mustafa Shukor, Jayneel Parekh et al.
Multi-Cache Enhanced Prototype Learning for Test-Time Generalization of Vision-Language Models
Xinyu Chen, Haotian Zhai, Can Zhang et al.
AVAM: a Universal Training-free Adaptive Visual Anchoring Embedded into Multimodal Large Language Model for Multi-image Question Answering
Kang Zeng, Guojin Zhong, Jintao Cheng et al.
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao, Beier Zhu, Qianru Sun et al.
TRNAS: A Training-Free Robust Neural Architecture Search
Yeming Yang, Qingling Zhu, Jianping Luo et al.
Staining and Locking Computer Vision Models Without Retraining
Oliver Sutton, Qinghua Zhou, George Leete et al.
The Inter-Intra Modal Measure: A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models
Laura Niss, Kevin Vogt-Lowell, Theodoros Tsiligkaridis
What to Distill? Fast Knowledge Distillation with Adaptive Sampling
Byungchul Chae, Seonyeong Heo
Flexi-FSCIL: Adaptive Knowledge Retention for Breaking the Stability-Plasticity Dilemma in Few-Shot Class-Incremental Learning
Wufei Xie, Yalin Wang, Chenliang Liu et al.
Multispectral Demosaicing via Dual Cameras
SaiKiran Tedla, Junyong Lee, Beixuan Yang et al.
Generative Modeling of Shape-Dependent Self-Contact Human Poses
Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito et al.
Met2Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems
Shaohan Li, Hao Yang, Min Chen et al.
Beyond RGB: Adaptive Parallel Processing for RAW Object Detection
Shani Gamrian, Hila Barel, Feiran Li et al.
TorchAdapt: Towards Light-Agnostic Real-Time Visual Perception
Khurram Azeem Hashmi, Karthik Suresh, Didier Stricker et al.
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Christopher Xie, Armen Avetisyan, Henry Howard-Jenkins et al.
POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction
Songyan Zhang, Yongtao Ge, Jinyuan Tian et al.
Boosting Class Representation via Semantically Related Instances for Robust Long-Tailed Learning with Noisy Labels
Yuhang Li, Zhuying Li, Yuheng Jia
CAT: A Unified Click-and-Track Framework for Realistic Tracking
Yongsheng Yuan, Jie Zhao, Dong Wang et al.
Diffusion-Based Extreme High-speed Scenes Reconstruction with the Complementary Vision Sensor
Yapeng Meng, Yihan Lin, Taoyi Wang et al.
DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
Qingcheng Zhao, Xiang Zhang, Haiyang Xu et al.
Invisible Watermarks, Visible Gains: Steering Machine Unlearning with Bi-Level Watermarking Design
Yuhao Sun, Yihua Zhang, Gaowen Liu et al.
DiffuMatch: Category-Agnostic Spectral Diffusion Priors for Robust Non-rigid Shape Matching
Emery Pierson, Lei Li, Angela Dai et al.
SAC-GNC: SAmple Consensus for adaptive Graduated Non-Convexity
Valter Piedade, Chitturi Sidhartha, José Gaspar et al.
Real3D: Towards Scaling Large Reconstruction Models with Real Images
Hanwen Jiang, Qixing Huang, Georgios Pavlakos
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Olaf Dünkel, Thomas Wimmer, Christian Theobalt et al.
Stochastic Interpolants for Revealing Stylistic Flows across the History of Art
Pingchuan Ma, Ming Gui, Johannes Schusterbauer et al.
Is Tracking really more challenging in First Person Egocentric Vision?
Matteo Dunnhofer, Zaira Manigrasso, Christian Micheloni
GeoExplorer: Active Geo-localization with Curiosity-Driven Exploration
Li Mi, Manon Béchaz, Zeming Chen et al.
Learning Large Motion Estimation from Intermediate Representations with a High-Resolution Optical Flow Dataset Featuring Long-Range Dynamic Motion
Hoonhee Cho, Yuhwan Jeong, Kuk-Jin Yoon
CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy
Dongyoung Kim, Mahmoud Afifi, Dongyun Kim et al.
MGSfM: Multi-Camera Geometry Driven Global Structure-from-Motion
peilin Tao, Hainan Cui, Diantao Tu et al.
Zero-shot Inexact CAD Model Alignment from a Single Image
Pattaramanee Arsomngern, Sasikarn Khwanmuang, Matthias Nießner et al.
Motal: Unsupervised 3D Object Detection by Modality and Task-specific Knowledge Transfer
Hai Wu, Hongwei Lin, Xusheng Guo et al.
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
Xiyao Wang, Zhengyuan Yang, Linjie Li et al.
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
Jiahao Zhang, Anoop Cherian, Cristian Rodriguez-Opazo et al.
MoMa-Kitchen: A 100K+ Benchmark for Affordance-Grounded Last-Mile Navigation in Mobile Manipulation
Pingrui Zhang, Xianqiang Gao, Yuhan Wu et al.
NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation
Peiran Xu, Xicheng Gong, Yadong Mu
GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation
Phillip Mueller, Talip Ünlü, Sebastian Schmidt et al.
OVA-Fields: Weakly Supervised Open-Vocabulary Affordance Fields for Robot Operational Part Detection
Heng Su, Mengying Xie, Nieqing Cao et al.
Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations
Jianhua Sun, Yuxuan Li, Jiude Wei et al.
Scaling 3D Compositional Models for Robust Classification and Pose Estimation
Xiaoding Yuan, Prakhar Kaushik, Guofeng Zhang et al.
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke, Suzannah Wistreich, Yanjie Ze et al.
DRaM-LHM: A Quaternion Framework for Iterative Camera Pose Estimation
Chen Lin, Weizhi Du, Zhixiang Min et al.
Prior-aware Dynamic Temporal Modeling Framework for Sequential 3D Hand Pose Estimation
Pengfei Ren, Jingyu Wang, Haifeng Sun et al.
Epipolar Consistent Attention Aggregation Network for Unsupervised Light Field Disparity Estimation
Chen Gao, Shuo Zhang, Youfang Lin
GloPER: Unsupervised Animal Pattern Extraction from Local Reconstruction
Bowen Chen, Yun Sing Koh, Gillian Dobbie
Focal Plane Visual Feature Generation and Matching on a Pixel Processor Array
Hongyi Zhang, Laurie Bose, Jianing Chen et al.
Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation
Hongyu Wen, Yiming Zuo, Venkat Subramanian et al.
SpatialTrackerV2: Advancing 3D Point Tracking with Explicit Camera Motion
Yuxi Xiao, Jianyuan Wang, Nan Xue et al.
A Simple yet Mighty Hartley Diffusion Versatilist for Generalizable Dense Vision Tasks
Qi Bi, Jingjun Yi, Huimin Huang et al.
IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
Wenxuan Guo, Xiuwei Xu, Hang Yin et al.
AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning
Dejie Yang, Zijing Zhao, Yang Liu
Unleashing the Temporal Potential of Stereo Event Cameras for Continuous-Time 3D Object Detection
Jae Young Kang, Hoonhee Cho, Kuk-Jin Yoon
PlaneRAS: Learning Planar Primitives for 3D Plane Recovery
Fang Zhang, Wenzhao Zheng, Linqing Zhao et al.
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark
Wufei Ma, Haoyu Chen, Guofeng Zhang et al.
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang, Yutong Liu, Yangguang Li et al.
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
Taowen Wang, Cheng Han, James Liang et al.
Simultaneous Motion And Noise Estimation with Event Cameras
Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego
Weakly-Supervised Learning of Dense Functional Correspondences
Stefan Stojanov, Linan Zhao, Yunzhi Zhang et al.
Layer-wise Vision Injection with Disentangled Attention for Efficient LVLMs
Xuange Zhang, Dengjie Li, Bo Liu et al.
StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth
Zheng Zhang, Lihe Yang, Tianyu Yang et al.
4DSegStreamer: Streaming 4D Panoptic Segmentation via Dual Threads
Ling Liu, Jun Tian, Li Yi
HccePose (BF): Predicting Front & Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation
Yulin Wang, Mengting Hu, Hongli Li et al.
GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting
Andrew Bond, Jui-Hsien Wang, Long Mai et al.
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis, Ahmet Karadeniz, Sebastian Cavada et al.
Exploring View Consistency for Scene-Adaptive Low-Light Light Field Image Enhancement
Shuo Zhang, Chen Gao, Youfang Lin
VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions
Yash Garg, Saketh Bachu, Arindam Dutta et al.
Tracking Tiny Drones against Clutter: Large-Scale Infrared Benchmark with Motion-Centric Adaptive Algorithm
Jiahao Zhang, Zongli Jiang, Gang Wang et al.
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger, Nina Wenzel, David Griffiths et al.
AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs
Yi-Ting Shen, Sungmin Eum, Doheon Lee et al.
Understanding Flatness in Generative Models: Its Role and Benefits
Taehwan Lee, Kyeongkook Seo, Jaejun Yoo et al.
Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints
Dinh-Vinh-Thuy Tran, Ruochen Chen, Shaifali Parashar
PHD: Personalized 3D Human Body Fitting with Point Diffusion
Hsuan-I Ho, Chen Guo, Po-Chen Wu et al.
Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing
Chengxu Liu, Lu Qi, Jinshan Pan et al.
ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
AO LI, Jinpeng Liu, Yixuan Zhu et al.
MonoSOWA: Scalable monocular 3D Object detector Without human Annotations
Jan Skvrna, Lukas Neumann
Estimating 2D Camera Motion with Hybrid Motion Basis
Haipeng Li, Tianhao Zhou, Zhanglei Yang et al.
H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction
Heng Jia, Na Zhao, Linchao Zhu
From Abyssal Darkness to Blinding Glare: A Benchmark on Extreme Exposure Correction in Real World
Bo Wang, Huiyuan Fu, Zhiye Huang et al.
TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras
Mohammad Mohammadi, Ziyi Wu, Igor Gilitschenski
Find Any Part in 3D
Ziqi Ma, Yisong Yue, Georgia Gkioxari
Global Motion Corresponder for 3D Point-Based Scene Interpolation under Large Motion
Junru Lin, Chirag Vashist, Mikaela Uy et al.
SpikeDiff: Zero-shot High-Quality Video Reconstruction from Chromatic Spike Camera and Sub-millisecond Spike Streams
Siqi Yang, Jinxiu Liang, Zhaojun Huang et al.
AJAHR: Amputated Joint Aware 3D Human Mesh Recovery
hyunjin cho, Giyun choi, Jongwon Choi
EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks
Athinoulla Konstantinou, Georgios Leontidis, Mamatha Thota et al.
Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras
Shuang Guo, Friedhelm Hamann, Guillermo Gallego
6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting
Yufeng Jin, Vignesh Prasad, Snehal Jauhri et al.
Background Invariance Testing According to Semantic Proximity
Zukang Liao, Min Chen
One Look is Enough: Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation on High-Resolution Images
Byeongjun Kwon, Munchurl Kim
Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision
Xiao Fang, Minhyek Jeon, Zheyang Qin et al.
RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration
Chong Cheng, Yu Hu, Sicheng Yu et al.
CObL: Toward Zero-Shot Ordinal Layering without User Prompting
Aneel Damaraju, Dean Hazineh, Todd Zickler
Hierarchical Material Recognition from Local Appearance
Matthew Beveridge, Shree Nayar
TopicGeo: An Efficient Unified Framework for Geolocation
Xin Wang, Xinlin Wang, Shuiping Gou
Revisiting Image Fusion for Multi-Illuminant White-Balance Correction
David Serrano, Aditya Arora, Luis Herranz et al.
Partially Matching Submap Helps: Uncetainty Modeling and Propagation for Text to Point Cloud Localization
Mingtao Feng, Longlong Mei, Zijie Wu et al.
Medical World Model
Yijun Yang, Zhao-Yang Wang, Qiuping Liu et al.
MaskHand: Generative Masked Modeling for Robust Hand Mesh Reconstruction in the Wild
Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Patel et al.
Uncertainty-Aware Gradient Stabilization for Small Object Detection
Huixin Sun, Yanjing Li, Linlin Yang et al.
CryoFastAR: Fast Cryo-EM Ab initio Reconstruction Made Easy
Jiakai Zhang, Shouchen Zhou, Haizhao Dai et al.
Beyond Pixel Uncertainty: Bounding the OoD Objects in Road Scenes
Huachao Zhu, Zelong Liu, Zhichao Sun et al.
Event-guided Unified Framework for Low-light Video Enhancement, Frame Interpolation, and Deblurring
Taewoo Kim, Kuk-Jin Yoon
PS-Mamba: Spatial-Temporal Graph Mamba for Pose Sequence Refinement
Haoye Dong, Gim Hee Lee
Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement
Qian Liang, Ruixu Geng, Jinbo Chen et al.
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Yusuke Hirota, Ryo Hachiuma, Boyi Li et al.
AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
Peizheng Li, Shuxiao Ding, You Zhou et al.
Environment-Agnostic Pose: Generating Environment-independent Object Representations for 6D Pose Estimation
Shaobo Zhang, Yuhang Huang, Wanqing Zhao et al.
Online Dense Point Tracking with Streaming Memory
Qiaole Dong, Yanwei Fu
MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting
Shaojie Ma, Yawei Luo, Wei Yang et al.
CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector
Abhinav Kumar, Yuliang Guo, Zhihao Zhang et al.
Test-Time Retrieval-Augmented Adaptation for Vision-Language Models
Xinqi Fan, Xueli CHEN, Luoxiao Yang et al.
RnGCam: High-speed video from rolling & global shutter measurements
Kevin Tandi, Xiang Dai, Chinmay Talegaonkar et al.
Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos
Chengbo Yuan, Geng Chen, Li Yi et al.
MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation
Xinhang Liu, Jiawei Shi, Zheng Dang et al.
ReassembleNet: Learnable Keypoints and Diffusion for 2D Fresco Reconstruction
ADEELA ISLAM, Stefano Fiorini, Stuart James et al.
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li, Hong-Xing Yu, Wei Liu et al.
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Kaixuan Jiang, Yang Liu, Weixing Chen et al.
Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models
Mateusz Michalkiewicz, Xinyue Bai, Mahsa Baktashmotlagh et al.
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
Arindam Dutta, Meng Zheng, Zhongpai Gao et al.
ReCoT: Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models
Mengxue Qu, Yibo Hu, Kunyang Han et al.
GenHaze: Pioneering Controllable One-Step Realistic Haze Generation for Real-World Dehazing
Sixiang Chen, Tian Ye, Yunlong Lin et al.
OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration
Yiming Zuo, Willow Yang, Zeyu Ma et al.
GECO: Geometrically Consistent Embedding with Lightspeed Inference
Regine Hartwig, Dominik Muhle, Riccardo Marin et al.
Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images
Philipp Wulff, Felix Wimbauer, Dominik Muhle et al.
LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling
Jiahao Wu, Rui Peng, Jianbo Jiao et al.
Combinative Matching for Geometric Shape Assembly
Nahyuk Lee, Juhong Min, Junhong Lee et al.
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs
Yihan Cao, Jiazhao Zhang, Zhinan Yu et al.
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus, Carl Doersch, Yi Yang et al.