"3d scene understanding" Papers

28 papers found

3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds

Hengshuo Chu, Xiang Deng, Qi Lv et al.

ICLR 2025posterarXiv:2502.20041
15
citations

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

Tatiana Zemskova, Dmitry Yudin

ICCV 2025posterarXiv:2412.18450
11
citations

ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis

Yun Chang, Leonor Fermoselle, Duy Ta et al.

CVPR 2025posterarXiv:2504.06553
3
citations

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

Benlin Liu, Yuhao Dong, Yiqin Wang et al.

CVPR 2025posterarXiv:2408.00754
9
citations

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025posterarXiv:2406.19353
25
citations

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu et al.

NeurIPS 2025posterarXiv:2510.20238
1
citations

CrossOver: 3D Scene Cross-Modal Alignment

Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys et al.

CVPR 2025highlightarXiv:2502.15011
7
citations

DiSCO-3D : Discovering and Segmenting Sub-Concepts from Open-vocabulary Queries in NeRF

Doriand Petit, Steve Bourgeois, Vincent Gay-Bellile et al.

ICCV 2025posterarXiv:2507.14596
1
citations

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025posterarXiv:2502.04144
38
citations

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

Wanhua Li, Yujie Zhao, Minghan Qin et al.

NeurIPS 2025posterarXiv:2507.07136
7
citations

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NeurIPS 2025posterarXiv:2505.24625
24
citations

Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding

Pedro Hermosilla, Christian Stippel, Leon Sick

CVPR 2025posterarXiv:2504.06719

PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Sinisa Stekovic, Arslan Artykov, Stefan Ainetter et al.

CVPR 2025posterarXiv:2404.10620
4
citations

Reasoning Beyond Points: A Visual Introspective Approach for Few-Shot 3D Segmentation

Changshuo Wang, Shuting He, Xiang Fang et al.

NeurIPS 2025poster

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NeurIPS 2025posterarXiv:2506.04308
51
citations

ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion

AO LI, Jinpeng Liu, Yixuan Zhu et al.

ICCV 2025posterarXiv:2509.07920

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Yue Zhang, Zhiyang Xu, Ying Shen et al.

ICLR 2025posterarXiv:2410.03878
19
citations

Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation

jusheng zhang, Yijia Fan, Zimo Wen et al.

NeurIPS 2025poster

An Embodied Generalist Agent in 3D World

Jiangyong Huang, Silong Yong, Xiaojian Ma et al.

ICML 2024poster

ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images

Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou et al.

ECCV 2024posterarXiv:2408.17027
8
citations

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding

Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.

ECCV 2024posterarXiv:2407.09781
11
citations

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion

Bohan Li, Jiajun Deng, Wenyao Zhang et al.

ECCV 2024posterarXiv:2407.02077
31
citations

Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

Wen Yuan Zhang, Kanle Shi, Yushen Liu et al.

ECCV 2024poster

M3SOT: Multi-Frame, Multi-Field, Multi-Space 3D Single Object Tracking

Jiaming Liu, Yue Wu, Maoguo Gong et al.

AAAI 2024paperarXiv:2312.06117
12
citations

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving

Tianwen Qian, Jingjing Chen, Linhai Zhuo et al.

AAAI 2024paperarXiv:2305.14836
266
citations

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024posterarXiv:2309.00616
82
citations

Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation

Chih-Jung Tsai, Hwann-Tzong Chen, Tyng-Luh Liu

ECCV 2024poster

SegPoint: Segment Any Point Cloud via Large Language Model

Shuting He, Henghui Ding, Xudong Jiang et al.

ECCV 2024posterarXiv:2407.13761
37
citations