🧬3D Vision

Depth Estimation

Estimating depth from images

100 papers5,169 total citations

Compare with other topics

Feb '24 — Jan '26814 papers

Top Conferences

CVPR: 52 ECCV: 23 AAAI: 12 ICLR: 7 ICCV: 5 NeurIPS: 1

Top Papers

#1

Grounding Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, Jerome Revaud

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

Chi Yan, Delin Qu, Dong Wang et al.

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Alexey Bochkovskiy, Amaël Delaunoy, Hugo Germain et al.

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

Junyi Zhang, Charles Herrmann, Junhwa Hur et al.

Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection

Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

Wenqiang Sun, Shuo Chen, Fangfu Liu et al.

Zero-Reference Low-Light Enhancement via Physical Quadruple Priors

Wenjing Wang, Huan Yang, Jianlong Fu et al.

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Yiqun Duan, Xianda Guo, Zheng Zhu

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Bowen Yin, Xuying Zhang, Zhong-Yu Li et al.

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Zheng Zhang, WENBO HU, Yixing Lao et al.

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Chong Mou, Xintao Wang, Jiechong Song et al.

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

Jonas Ricker, Denis Lukovnikov, Asja Fischer

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Xiang Chen, Jinshan Pan, Jiangxin Dong

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Ming Gui, Johannes Schusterbauer, Ulrich Prestel et al.

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

Zhenggang Tang, Yuchen Fan, Dilin Wang et al.

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Jiarui Hu, Xianhao Chen, Boyin Feng et al.

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

Jiayu Yang, Ziang Cheng, Yunfei Duan et al.

MonoCD: Monocular 3D Object Detection with Complementary Depths

Longfei Yan, Pei Yan, Shengzhou Xiong et al.

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Yanzuo Lu, Manlin Zhang, Jinhua Ma et al.

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.

ECCV 2024arXiv:2404.03507

tiny object detectiondetr-like methodsdynamic query selectionobject query adjustment+4

56

citations

#22

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Yuanwen Yue, Anurag Das, Francis Engelmann et al.

ECCV 2024arXiv:2407.20229

3d gaussian representationsemantic feature lifting3d-aware fine-tuning2d foundation models+4

55

citations

#23

Wonderland: Navigating 3D Scenes from a Single Image

Hanwen Liang, Junli Cao, Vidit Goel et al.

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

George Cazenavette, Avneesh Sud, Thomas Leung et al.

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

Xiaofeng Cong, Jie Gui, Jing Zhang et al.

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

Dong Wu, Mingmin Chi, Xuan Zang et al.

AAAI 2024arXiv:2309.00526

monocular depth estimationself-supervised learningself-cost volumescene geometry+4

52

citations

#27

Bilateral Propagation Network for Depth Completion

Jie Tang, Fei-Peng Tian, Boshi An et al.

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Bhat, Peter Wonka

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Rui Chen, Jianfeng Zhang, Yixun Liang et al.

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Yucheng Suo, Fan Ma, Linchao Zhu et al.

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen et al.

MET3R: Measuring Multi-View Consistency in Generated Images

Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.

CVPR 2025arXiv:2501.06336

multi-view consistencyimage generation3d reconstructionnovel view synthesis+3

43

citations

#33

Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach

Guoqiang Liang, Kanghao Chen, Hangyu Li et al.

Learning Diffusion Texture Priors for Image Restoration

Tian Ye, Sixiang Chen, Wenhao Chai et al.

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Wanshui Gan, Fang Liu, Hongbin Xu et al.

Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch et al.

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

Quan Zhang, Xiaoyu Liu, Wei Li et al.

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.

SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization

Zhenlong Yuan, Jiakai Cao, Zhaoxin Li et al.

AAAI 2024arXiv:2401.06385

multi-view stereo3d reconstructiontextureless areassegment anything model+4

35

citations

#40

NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views

Han Huang, Yulun Wu, Junsheng Zhou et al.

AAAI 2024arXiv:2312.13977

neural implicit functionsmulti-view reconstructionsparse view reconstructionsurface reconstruction+3

35

citations

#41

Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

Feng Liu, Tengteng Huang, Qianjing Zhang et al.

Synergistic Multiscale Detail Refinement via Intrinsic Supervision for Underwater Image Enhancement

Dehuan Zhang, Jingchun Zhou, Chunle Guo et al.

AAAI 2024arXiv:2308.11932

underwater image enhancementmulti-scale detail refinementintrinsic supervisionadaptive feature propagation+3

33

citations

#43

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

Fan Zhang, Shaodi You, Yu Li et al.

Material Palette: Extraction of Materials from a Single Image

Ivan Lopes, Fabio Pizzati, Raoul de Charette

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

Haiming Zhang, Xu Yan, Dongfeng Bai et al.

AAAI 2024arXiv:2312.11829

3d occupancy predictioncross-modal knowledge distillationmulti-view imagesvolume rendering+4

31

citations

#46

Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network

ye junyan, Zhutao Lv, Li Weijia et al.

Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation

Xiaoyang Wang, Huihui Bai, Limin Yu et al.

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

xin zhang, Jiawei Du, Weiying Xie et al.

Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi

ECCV 2024arXiv:2407.16698

monocular depth estimationdiffusion modelstext-to-image generationdepth-aware control+4

29

citations

#50

VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

Sungwon Hwang, Min-Jung Kim, Taewoong Kang et al.

ECCV 2024arXiv:2407.02945

urban scene reconstruction3d gaussian splattingextrapolated view synthesisneural rendering+4

28

citations

#51

LaneCPP: Continuous 3D Lane Detection using Physical Priors

Maximilian Pittner, Joel Janai, Alexandru Paul Condurache

Blind Image Quality Assessment Based on Geometric Order Learning

Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim

ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining

Ruoxi Shi, Xinyue Wei, Cheng Wang et al.

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Qingming LIU, Yuan Liu, Jiepeng Wang et al.

Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations

Rui Zhao, Ruiqin Xiong, Jing Zhao et al.

Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations

Tomáš Chobola, Yu Liu, Hanyi Zhang et al.

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel

Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment

Ziyu Shan, Yujie Zhang, Qi Yang et al.

Test-Time Adaptation for Depth Completion

Hyoungseob Park, Anjali W Gupta, Alex Wong

GeoCalib: Learning Single-image Calibration with Geometric Optimization

Alexander Veicht, Paul-Edouard Sarlin, Philipp Lindenberger et al.

Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

Ben Eisner, Yi Yang, Todor Davchev et al.

ZeST: Zero-Shot Material Transfer from a Single Image

Ta-Ying Cheng, Prafull Sharma, Andrew Markham et al.

UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

yaofeng xie, Lingwei Kong, Kai Chen et al.

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

Tao Tang, Guangrun Wang, Yixing Lao et al.

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

Massimiliano Viola, Kevin Qu, Nando Metzger et al.

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

Tian-Xing Xu, Xiangjun Gao, Wenbo Hu et al.

CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction

Zhangchen Ye, Tao Jiang, Chenfeng Xu et al.

ECCV 2024arXiv:2409.13430

3d occupancy predictioncost volume fusiontemporal feature integrationmonocular depth estimation+3

19

citations

#70

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

Tuo Feng, Wenguan Wang, Fan Ma et al.

FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency

Han Huang, Yulun Wu, Chao Deng et al.

Zero-Shot Aerial Object Detection with Visual Description Regularization

Chenyu Lin, Zhengqing Zang, Chenwei Tang et al.

AAAI 2024arXiv:2402.18233

zero-shot detectionaerial object detectionvisual description regularizationsemantic-visual correlation+4

18

citations

#73

Dense Projection for Anomaly Detection

Dazhi Fu, Zhao Zhang, Jicong Fan

Lifting by Image – Leveraging Image Cues for Accurate 3D Human Pose Estimation

Feng Zhou, Jianqin Yin, Peiyang Li

AAAI 2024arXiv:2312.15636

3d human pose estimationdepth ambiguity problemimage feature selectionpose-guided transformer+4

18

citations

#75

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan, Ajay Bati, Sangmin Lee et al.

Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging

Zongliang Wu, Ruiying Lu, Ying Fu et al.

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

WENCAN CHENG, Hao Tang, Luc Van Gool et al.

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving

Cheng Zhao, su sun, Ruoyu Wang et al.

SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

Andreas Engelhardt, Amit Raj, Mark Boss et al.

PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Farooq Bhat, Peter Wonka

Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses

Inhee Lee, Byungjun Kim, Hanbyul Joo

Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems

Hyungjin Chung, Jong Chul Ye

AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation

Yangchao Wu, Tian Yu Liu, Hyoungseob Park et al.

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

Zhibing Li, Tong Wu, Jing Tan et al.

VEON: Vocabulary-Enhanced Occupancy Prediction

Jilai Zheng, Pin Tang, Zhongdao Wang et al.

DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model

Zhenghao Pan, Haijin Zeng, Jiezhang Cao et al.

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.

Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors

Weilong Yan, Ming Li, Li Haipeng et al.

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

Khiem Vuong, Anurag Ghosh, Deva Ramanan et al.

DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo

Zhenlong Yuan, Jinguo Luo, Fei Shen et al.

DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation

Yueru Luo, Shuguang Cui, Zhen Li

Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation

Jiyuan Wang, Chunyu Lin, cheng guan et al.

NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting

Yulong Zheng, Zicheng Jiang, Shengfeng He et al.

LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes

Juliette Marrie, Romain Menegaux, Michael Arbel et al.

Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion

Zhiqiang Yan, Zhengxue Wang, Kun Wang et al.

CVPR 2025arXiv:2412.19225

depth completiondegradation-aware frameworkrgb-d fusionmulti-modal conditional mamba+3

12

citations

#97

Weakly Supervised Monocular 3D Detection with a Single-View Image

Xueying Jiang, Sheng Jin, Lewei Lu et al.

OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

Jinghua Hou, Tong Wang, Xiaoqing Ye et al.

Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing

Ruiyi Wang, Yushuo Zheng, Zicheng Zhang et al.

MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation

Linyan Yang, Lukas Hoyer, Mark Weber et al.

ECCV 2024arXiv:2408.16478

unsupervised domain adaptationsemantic segmentationdomain gap bridginggeometric information integration+3

12

citations

Depth Estimation

Top Conferences

Related Topics (3D Vision)

Top Papers

Grounding Image Matching in 3D with MASt3R

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

Zero-Reference Low-Light Enhancement via Physical Quadruple Priors

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

MonoCD: Monocular 3D Object Detection with Complementary Depths

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Wonderland: Navigating 3D Scenes from a Single Image

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint

SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

Bilateral Propagation Network for Depth Completion

PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

MET3R: Measuring Multi-View Consistency in Generated Images

Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach

Learning Diffusion Texture Priors for Image Restoration

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

Distilling Semantic Priors from SAM to Efficient Image Restoration Models

DEFOM-Stereo: Depth Foundation Model Based Stereo Matching

SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization

NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views

Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

Synergistic Multiscale Detail Refinement via Intrinsic Supervision for Underwater Image Enhancement

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

Material Palette: Extraction of Materials from a Single Image

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network

Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

LaneCPP: Continuous 3D Lane Detection using Physical Priors

Blind Image Quality Assessment Based on Geometric Order Learning

ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations

Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment

Test-Time Adaptation for Depth Completion

GeoCalib: Learning Single-image Calibration with Geometric Optimization

Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

ZeST: Zero-Shot Material Transfer from a Single Image

UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

Video Depth without Video Models

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency

Zero-Shot Aerial Object Detection with Visual Description Regularization

Dense Projection for Anomaly Detection

Lifting by Image – Leveraging Image Cues for Accurate 3D Human Pose Estimation

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging