"scene understanding" Papers

105 papers found • Page 2 of 3

Prioritizing Perception-Guided Self-Supervision: A New Paradigm for Causal Modeling in End-to-End Autonomous Driving

Yi Huang, Zhan Qu, Lihui Jiang et al.

NEURIPS 2025arXiv:2511.08214
1
citations

Promptable 3-D Object Localization with Latent Diffusion Models

Cheng-Yao Hong, Li-Heng Wang, Tyng-Luh Liu

NEURIPS 2025

PSI: A Benchmark for Human Interpretation and Response in Traffic Interactions

TAOTAO JING, Tina Chen, Renran Tian et al.

NEURIPS 2025

Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation

Edward LOO, Jiacheng Deng

CVPR 2025arXiv:2506.17891
6
citations

RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving

Zhijian Huang, Chengjian Feng, Baihui Xiao et al.

ICCV 2025arXiv:2412.07689
12
citations

Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs

Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.

ICCV 2025arXiv:2508.00169
2
citations

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Yue Li, Qi Ma, Runyi Yang et al.

ICCV 2025arXiv:2503.18052
21
citations

Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision

Maoji Zheng, Ziyu Xu, Qiming Xia et al.

AAAI 2025paperarXiv:2503.16811
3
citations

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Yunxiang Fu, Meng Lou, Yizhou Yu

CVPR 2025arXiv:2412.11890
25
citations

Spiking Vision Transformer with Saccadic Attention

Shuai Wang, Malu Zhang, Dehao Zhang et al.

ICLR 2025oralarXiv:2502.12677
17
citations

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Yong Liu, Song-Li Wu, Sule Bai et al.

ICCV 2025arXiv:2506.16058
2
citations

Supercharging Floorplan Localization with Semantic Rays

Yuval Grader, Hadar Averbuch-Elor

ICCV 2025arXiv:2507.09291
2
citations

TACO: Taming Diffusion for in-the-wild Video Amodal Completion

Ruijie Lu, Yixin Chen, Yu Liu et al.

ICCV 2025arXiv:2503.12049
10
citations

Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers

An Lun Liu, Yu-Wei Chao, Yi-Ting Chen

ICCV 2025arXiv:2507.11287

The 3D-PC: a benchmark for visual perspective taking in humans and machines

Drew Linsley, Peisen Zhou, Alekh Ashok et al.

ICLR 2025arXiv:2406.04138
10
citations

TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving

Yanping Fu, Xinyuan Liu, Tianyu Li et al.

NEURIPS 2025arXiv:2505.17771
4
citations

Towards Efficient Foundation Model for Zero-shot Amodal Segmentation

Zhaochen Liu, Limeng Qiao, Xiangxiang Chu et al.

CVPR 2025
3
citations

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

ICCV 2025arXiv:2503.10225
3
citations

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Baichuan Zhou, Haote Yang, Dairong Chen et al.

AAAI 2025paperarXiv:2408.17267
26
citations

Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting

Anand Bhattad, Konpat Preechakul, Alexei Efros

NEURIPS 2025arXiv:2503.21770
8
citations

3D Small Object Detection with Dynamic Spatial Pruning

Xiuwei Xu, Zhihao Sun, Ziwei Wang et al.

ECCV 2024arXiv:2305.03716
9
citations

An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding

Wei Chen, Long Chen, Yu Wu

ECCV 2024arXiv:2408.01120
17
citations

Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Wentao Mo, Yang Liu

AAAI 2024paperarXiv:2402.15933
26
citations

CarFormer: Self-Driving with Learned Object-Centric Representations

Shadi Hamdan, Fatma Guney

ECCV 2024arXiv:2407.15843
12
citations

CLEO: Continual Learning of Evolving Ontologies

Shishir Muralidhara, Saqib Bukhari, Georg Dr. Schneider et al.

ECCV 2024arXiv:2407.08411
3
citations

CountFormer: Multi-View Crowd Counting Transformer

Hong Mo, Xiong Zhang, Jianchao Tan et al.

ECCV 2024arXiv:2407.02047
9
citations

DeTra: A Unified Model for Object Detection and Trajectory Forecasting

Sergio Casas, Ben T Agro, Jiageng Mao et al.

ECCV 2024arXiv:2406.04426
12
citations

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang et al.

CVPR 2024arXiv:2401.08739
24
citations

EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Amir Bar, Arya Bakhtiar, Danny L Tran et al.

ECCV 2024arXiv:2404.09991
11
citations

FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

Florian Langer, Jihong Ju, Georgi Dikov et al.

ECCV 2024arXiv:2403.15161
6
citations

General Geometry-aware Weakly Supervised 3D Object Detection

Guowen Zhang, Junsong Fan, Liyi Chen et al.

ECCV 2024arXiv:2407.13748
5
citations

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Kai Zhang, Sai Bi, Hao Tan et al.

ECCV 2024arXiv:2404.19702
251
citations

GSN: Generalisable Segmentation in Neural Radiance Field

Siddharth Barman, Umang Bhaskar, Yeshwant Pandit et al.

AAAI 2024paperarXiv:2402.04632
1
citations

Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping

Hyeongjun Kwon, Jinhyun Jang, Jin Kim et al.

CVPR 2024arXiv:2404.00974
10
citations

Language Model Guided Interpretable Video Action Reasoning

Ning Wang, Guangming Zhu, Hongsheng Li et al.

CVPR 2024arXiv:2404.01591
7
citations

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang, Tao Tang, Shaoxiang Chen et al.

ECCV 2024arXiv:2408.13890
40
citations

MC-PanDA: Mask Confidence for Panoptic Domain Adaptation

Ivan Martinovic, Josip Šarić, Siniša Šegvić

ECCV 2024arXiv:2407.14110
2
citations

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

Baijiong Lin, Weisen Jiang, Pengguang Chen et al.

ECCV 2024arXiv:2407.02228
29
citations

NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Muhammad Zubair Irshad, Sergey Zakharov, Vitor Guizilini et al.

ECCV 2024arXiv:2404.01300
22
citations

Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

Yonggan Fu, Huaizhi Qu, Zhifan Ye et al.

ECCV 2024arXiv:2403.11131

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.

CVPR 2024arXiv:2311.11666
68
citations

Open Panoramic Segmentation

Junwei Zheng, Ruiping Liu, Yufan Chen et al.

ECCV 2024arXiv:2407.02685
15
citations

OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models

Zijian Zhou, Zheng Zhu, Holger Caesar et al.

ECCV 2024arXiv:2407.11213
13
citations

PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion

Runsong Zhu, Shi Qiu, Qianyi Wu et al.

ECCV 2024arXiv:2410.10659
9
citations

Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes

Zelong Zeng, Kaname Tomite

ECCV 2024arXiv:2404.17961
1
citations

Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang et al.

CVPR 2024arXiv:2312.02145
332
citations

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

Junjie Zhang, Chenjia Bai, Haoran He et al.

ICML 2024arXiv:2405.19586
27
citations

SAM-guided Graph Cut for 3D Instance Segmentation

Haoyu Guo, He Zhu, Sida Peng et al.

ECCV 2024arXiv:2312.08372
32
citations

ScanERU: Interactive 3D Visual Grounding Based on Embodied Reference Understanding

Ziyang Lu, Yunqiang Pei, Guoqing Wang et al.

AAAI 2024paperarXiv:2303.13186
12
citations

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection

Tim Salzmann, Markus Ryll, Alex Bewley et al.

ECCV 2024arXiv:2403.14270
8
citations