🧬Vision Recognition

3D Object Detection

Detecting objects in 3D space

100 papers9,779 total citations
Compare with other topics
Feb '24 Jan '26945 papers
Also includes: 3d object detection, 3d detection, lidar detection, point cloud detection

Top Papers

#1

DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao, Wenyu Lv, Shangliang Xu et al.

CVPR 2024
2,424
citations
#2

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Tao Lu, Mulin Yu, Linning Xu et al.

CVPR 2024
589
citations
#3

Grounding Image Matching in 3D with MASt3R

Vincent Leroy, Yohann Cabon, Jerome Revaud

ECCV 2024
499
citations
#4

SplaTAM: Splat Track & Map 3D Gaussians for Dense RGB-D SLAM

Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula et al.

CVPR 2024
477
citations
#5

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

Zeyu Yang, Hongye Yang, Zijie Pan et al.

ICLR 2024
440
citations
#6

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen, Wei Yang, Jan Kautz et al.

CVPR 2024
412
citations
#7

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

Chi Yan, Delin Qu, Dong Wang et al.

CVPR 2024
359
citations
#8

Text-to-3D using Gaussian Splatting

Zilong Chen, Feng Wang, Yikai Wang et al.

CVPR 2024
330
citations
#9

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Vikram Voleti, Chun-Han Yao, Mark Boss et al.

ECCV 2024
315
citations
#10

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

Lu Ling, Yichen Sheng, Zhi Tu et al.

CVPR 2024
266
citations
#11

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Shunyuan Zheng, Boyao ZHOU, Ruizhi Shao et al.

CVPR 2024
160
citations
#12

SweetDreamer: Aligning Geometric Priors in 2D diffusion for Consistent Text-to-3D

Weiyu LI, Rui Chen, Xuelin Chen et al.

ICLR 2024
151
citations
#13

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

Zhiyuan Yan, Yuhao Luo, Siwei Lyu et al.

CVPR 2024
133
citations
#14

Probing the 3D Awareness of Visual Foundation Models

Mohamed El Banani, Amit Raj, Kevis-kokitsi Maninis et al.

CVPR 2024
130
citations
#15

Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection

Jiangnan Yang, Shuangli Liu, Jingjun Wu et al.

AAAI 2025
115
citations
#16

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Decoupled Video Diffusion

Wenqiang Sun, Shuo Chen, Fangfu Liu et al.

ICCV 2025
103
citations
#17

3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis

Zhicheng Lu, xiang guo, Le Hui et al.

CVPR 2024
99
citations
#18

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024arXiv:2309.00616
open-vocabulary instance segmentation3d scene understandingvision-language modelspoint cloud segmentation+3
82
citations
#19

IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection

Junbo Yin, Wenguan Wang, Runnan Chen et al.

CVPR 2024
81
citations
#20

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Jiarui Hu, Xianhao Chen, Boyin Feng et al.

ECCV 2024
78
citations
#21

RGBD GS-ICP SLAM

Seongbo Ha, Jiung Yeon, Hyeonwoo Yu

ECCV 2024
70
citations
#22

Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

Yanguang Sun, Chunyan Xu, Jian Yang et al.

ECCV 2024
68
citations
#23

TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Yufu Wang, Ziyun Wang, Lingjie Liu et al.

ECCV 2024arXiv:2403.17346
human motion reconstructionglobal trajectory estimationslam robustificationvideo transformer model+4
66
citations
#24

Unifying 3D Vision-Language Understanding via Promptable Queries

ziyu zhu, Zhuofan Zhang, Xiaojian Ma et al.

ECCV 2024
64
citations
#25

MonoCD: Monocular 3D Object Detection with Complementary Depths

Longfei Yan, Pei Yan, Shengzhou Xiong et al.

CVPR 2024
64
citations
#26

Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos

Linyi Jin, Richard Tucker, Zhengqi Li et al.

CVPR 2025arXiv:2412.09621
stereo depth estimation4d reconstructiondynamic 3d scenescamera pose estimation+4
58
citations
#27

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.

ECCV 2024arXiv:2404.03507
tiny object detectiondetr-like methodsdynamic query selectionobject query adjustment+4
56
citations
#28

Controlling Space and Time with Diffusion Models

Daniel Watson, Saurabh Saxena, Lala Li et al.

ICLR 2025
55
citations
#29

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Yuanwen Yue, Anurag Das, Francis Engelmann et al.

ECCV 2024arXiv:2407.20229
3d gaussian representationsemantic feature lifting3d-aware fine-tuning2d foundation models+4
55
citations
#30

Wonderland: Navigating 3D Scenes from a Single Image

Hanwen Liang, Junli Cao, Vidit Goel et al.

CVPR 2025
54
citations
#31

Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network

wenqiao Li, Xiaohao Xu, Yao Gu et al.

CVPR 2024
50
citations
#32

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Lewei Yao, Renjie Pi, Jianhua Han et al.

CVPR 2024
45
citations
#33

ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

Yongwei Chen, Tengfei Wang, Tong Wu et al.

ECCV 2024arXiv:2403.12409
3d asset generationsingle-image 3d generationspatially-aware diffusion guidancescore distillation sampling+4
45
citations
#34

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

Qiuhong Shen, Xingyi Yang, Xinchao Wang

ECCV 2024
45
citations
#35

HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios

HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp et al.

CVPR 2024
42
citations
#36

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

David Rozenberszki, Or Litany, Angela Dai

CVPR 2024
40
citations
#37

EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Qiao Gu, Zhaoyang Lv, Duncan Frost et al.

ECCV 2024
40
citations
#38

Scene Adaptive Sparse Transformer for Event-based Object Detection

Yansong Peng, Li Hebei, Yueyi Zhang et al.

CVPR 2024
40
citations
#39

Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection

Yajing Liu, Shijun Zhou, Xiyao Liu et al.

CVPR 2024
37
citations
#40

SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving

Georg Hess, Carl Lindström, Maryam Fatemi et al.

CVPR 2025
37
citations
#41

Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch et al.

CVPR 2024
37
citations
#42

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Wanshui Gan, Fang Liu, Hongbin Xu et al.

ICCV 2025
37
citations
#43

R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection

Zheyuan Zhou, Wang Le, Naiyu Fang et al.

ECCV 2024
36
citations
#44

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Ben Agro, Quinlan Sykora, Sergio Casas et al.

CVPR 2024
35
citations
#45

ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

Shuxiao Ding, Lukas Schneider, Marius Cordts et al.

CVPR 2024
34
citations
#46

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Yoad Tewel, Rinon Gal, Dvir Samuel et al.

ICLR 2025arXiv:2411.07232
attention mechanismdiffusion modelssemantic image editingobject insertion+3
34
citations
#47

AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation

Yuanwen Yue, Sabarinath Mahadevan, Jonas Schult et al.

ICLR 2024
34
citations
#48

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Qijian Tian, Xin Tan, Yuan Xie et al.

AAAI 2025
34
citations
#49

Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal

YUXIN WANG, Qianyi Wu, Guofeng Zhang et al.

ECCV 2024
33
citations
#50

Towards Generalizable Multi-Object Tracking

Zheng Qin, Le Wang, Sanping Zhou et al.

CVPR 2024
32
citations
#51

SAM-guided Graph Cut for 3D Instance Segmentation

Haoyu Guo, He Zhu, Sida Peng et al.

ECCV 2024
32
citations
#52

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

Haiming Zhang, Xu Yan, Dongfeng Bai et al.

AAAI 2024arXiv:2312.11829
3d occupancy predictioncross-modal knowledge distillationmulti-view imagesvolume rendering+4
31
citations
#53

Open-World Human-Object Interaction Detection via Multi-modal Prompts

Jie Yang, Bingliang Li, Ailing Zeng et al.

CVPR 2024
31
citations
#54

CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection

Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli et al.

CVPR 2024
31
citations
#55

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.

CVPR 2025
30
citations
#56

3D-HGS: 3D Half-Gaussian Splatting

Haolin Li, Jinyang Liu, Mario Sznaier et al.

CVPR 2025
30
citations
#57

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.

CVPR 2025arXiv:2411.11921
gaussian splattingstatic-dynamic decompositionsurface reconstructionautonomous driving+3
29
citations
#58

LaneCPP: Continuous 3D Lane Detection using Physical Priors

Maximilian Pittner, Joel Janai, Alexandru Paul Condurache

CVPR 2024
28
citations
#59

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Zehan Wang, Ziang Zhang, Tianyu Pang et al.

ICML 2025
28
citations
#60

I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions

Chengfeng Zhao, Juze Zhang, Jiashen Du et al.

CVPR 2024
28
citations
#61

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

Xiangyang Zhu, Renrui Zhang, Bowei He et al.

CVPR 2024
27
citations
#62

CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression

Yu-Ting Zhan, Cheng-Yuan Ho, He-Bi Yang et al.

ICLR 2025arXiv:2503.00357
3d gaussian splattingrate-distortion optimization3d representation compressionautoregressive entropy coding+3
26
citations
#63

FRED: Towards a Full Rotation-Equivariance in Aerial Image Object Detection

Chanho Lee, Jinsu Son, Hyounguk Shon et al.

AAAI 2024arXiv:2401.06159
rotation-equivarianceoriented object detectiondeformable convolutionaerial image analysis+4
26
citations
#64

Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion

Zuoyue Li, Zhenqiang Li, Zhaopeng Cui et al.

CVPR 2024
26
citations
#65

Multi-Object Tracking in the Dark

Xinzhe Wang, Kang Ma, Qiankun Liu et al.

CVPR 2024
25
citations
#66

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025
25
citations
#67

Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking

Wei Cao, Chang Luo, Biao Zhang et al.

CVPR 2024
25
citations
#68

Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

Linzhan Mou, Jun-Kun Chen, Yu-Xiong Wang

CVPR 2024
25
citations
#69

Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models

Hyeonwoo Kim, Sookwan Han, Patrick Kwon et al.

ECCV 2024
25
citations
#70

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir et al.

CVPR 2024
24
citations
#71

MonoDiff: Monocular 3D Object Detection and Pose Estimation with Diffusion Models

Yasiru Ranasinghe, Deepti Hegde, Vishal M. Patel

CVPR 2024
24
citations
#72

SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection

Haimei Zhao, Qiming Zhang, Shanshan Zhao et al.

AAAI 2024arXiv:2303.16818
3d object detectionmulti-view cameralidar-camera fusionbird's-eye-view space+4
24
citations
#73

OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

Hu Zhang, xu jianhua, Tao Tang et al.

ECCV 2024
24
citations
#74

LISO: Lidar-only Self-Supervised 3D Object Detection

Stefan Baur, Frank Moosmann, Andreas Geiger

ECCV 2024
24
citations
#75

Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes

Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das et al.

CVPR 2024
24
citations
#76

Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions

Yujeong Chae, Hyeonseong Kim, Kuk-Jin Yoon

CVPR 2024
23
citations
#77

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces

Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang et al.

CVPR 2025
23
citations
#78

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

Chandradeep Pokhariya, Ishaan Shah, Angela Xing et al.

CVPR 2024
23
citations
#79

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection

Dongmei Zhang, Chang Li, Renrui Zhang et al.

AAAI 2024arXiv:2312.14465
open-vocabulary 3d detectioncross-modal knowledge blendingfoundation modelsgrounded-segment-anything+4
22
citations
#80

GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection

Xiaotian Li, Baojie Fan, Jiandong Tian et al.

CVPR 2024
22
citations
#81

Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection

Le Yang, Ziwei Zheng, Boxu Chen et al.

CVPR 2025
22
citations
#82

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.

ICLR 2025
21
citations
#83

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.

CVPR 2025arXiv:2411.16856
3d object generationautoregressive modelsvector-quantized variational autoencodermulti-scale representation+3
21
citations
#84

Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties

wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.

CVPR 2025
20
citations
#85

Deep Orthogonal Hypersphere Compression for Anomaly Detection

Yunhe Zhang, Yan Sun, Jinyu Cai et al.

ICLR 2024
19
citations
#86

LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

Tuo Feng, Wenguan Wang, Fan Ma et al.

CVPR 2024
19
citations
#87

CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction

Zhangchen Ye, Tao Jiang, Chenfeng Xu et al.

ECCV 2024arXiv:2409.13430
3d occupancy predictioncost volume fusiontemporal feature integrationmonocular depth estimation+3
19
citations
#88

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.

CVPR 2024
19
citations
#89

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

Jiacheng Zhang, Jiaming Li, Xiangru Lin et al.

CVPR 2024
19
citations
#90

Scalable 3D Registration via Truncated Entry-wise Absolute Residuals

Tianyu Huang, Liangzu Peng, Rene Vidal et al.

CVPR 2024
19
citations
#91

SEED: A Simple and Effective 3D DETR in Point Clouds

Zhe Liu, Jinghua Hou, Xiaoqing Ye et al.

ECCV 2024
19
citations
#92

Cubify Anything: Scaling Indoor 3D Object Detection

Justin Lazarow, David Griffiths, Gefen Kohavi et al.

CVPR 2025
18
citations
#93

Dense Projection for Anomaly Detection

Dazhi Fu, Zhao Zhang, Jicong Fan

AAAI 2024
18
citations
#94

Zero-Shot Aerial Object Detection with Visual Description Regularization

Chenyu Lin, Zhengqing Zang, Chenwei Tang et al.

AAAI 2024arXiv:2402.18233
zero-shot detectionaerial object detectionvisual description regularizationsemantic-visual correlation+4
18
citations
#95

Crowd-SAM:SAM as a smart annotator for object detection in crowded scenes

Zhi Cai, Yingjie Gao, Yaoyan Zheng et al.

ECCV 2024
18
citations
#96

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

WENCAN CHENG, Hao Tang, Luc Van Gool et al.

CVPR 2024
17
citations
#97

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

Haoxuanye Ji, Pengpeng Liang, Erkang Cheng

CVPR 2024
17
citations
#98

3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Xiaobiao Du, Yida Wang, Haiyang Sun et al.

ICCV 2025
17
citations
#99

Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Tien Toan Nguyen, Minh Nhat Nhat Vu, Baoru Huang et al.

ECCV 2024arXiv:2407.13842
6-dof grasp detectionlanguage-driven roboticspoint cloud processingnegative prompt guidance+4
17
citations
#100

SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration

Kezheng Xiong, Maoji Zheng, Qingshan Xu et al.

AAAI 2024arXiv:2312.08664
point cloud registrationcross-source point cloudsskeletal representationsunsupervised skeleton extraction+4
17
citations