Poster "3d scene understanding" Papers

33 papers found

3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds

Hengshuo Chu, Xiang Deng, Qi Lv et al.

ICLR 2025posterarXiv:2502.20041
15
citations

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

Tatiana Zemskova, Dmitry Yudin

ICCV 2025posterarXiv:2412.18450
11
citations

ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis

Yun Chang, Leonor Fermoselle, Duy Ta et al.

CVPR 2025posterarXiv:2504.06553
3
citations

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

Benlin Liu, Yuhao Dong, Yiqin Wang et al.

CVPR 2025posterarXiv:2408.00754
9
citations

CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement

Yun Liu, Chengwen Zhang, Ruofan Xing et al.

CVPR 2025posterarXiv:2406.19353
25
citations

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu et al.

NeurIPS 2025posterarXiv:2510.20238
1
citations

DiSCO-3D : Discovering and Segmenting Sub-Concepts from Open-vocabulary Queries in NeRF

Doriand Petit, Steve Bourgeois, Vincent Gay-Bellile et al.

ICCV 2025posterarXiv:2507.14596
1
citations

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.

CVPR 2025posterarXiv:2502.04144
38
citations

LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS

Wanhua Li, Yujie Zhao, Minghan Qin et al.

NeurIPS 2025posterarXiv:2507.07136
7
citations

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NeurIPS 2025posterarXiv:2505.24625
24
citations

Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding

Pedro Hermosilla, Christian Stippel, Leon Sick

CVPR 2025posterarXiv:2504.06719

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Amandine Brunetto, Sascha Hornauer, Fabien Moutarde

ICLR 2025posterarXiv:2405.18213
9
citations

PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

Sinisa Stekovic, Arslan Artykov, Stefan Ainetter et al.

CVPR 2025posterarXiv:2404.10620
4
citations

Reasoning Beyond Points: A Visual Introspective Approach for Few-Shot 3D Segmentation

Changshuo Wang, Shuting He, Xiang Fang et al.

NeurIPS 2025poster

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi et al.

NeurIPS 2025posterarXiv:2506.04308
51
citations

ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion

AO LI, Jinpeng Liu, Yixuan Zhu et al.

ICCV 2025posterarXiv:2509.07920

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Yue Zhang, Zhiyang Xu, Ying Shen et al.

ICLR 2025posterarXiv:2410.03878
19
citations

SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding

Zhao Jin, Rong-Cheng Tu, Jingyi Liao et al.

NeurIPS 2025posterarXiv:2506.21924
2
citations

TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

Yan Xia, Yunxiang Lu, Rui Song et al.

ICCV 2025posterarXiv:2412.10308
1
citations

Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation

jusheng zhang, Yijia Fan, Zimo Wen et al.

NeurIPS 2025poster

WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild

Morris Alper, David Novotny, Filippos Kokkinos et al.

NeurIPS 2025posterarXiv:2506.13030
1
citations

An Embodied Generalist Agent in 3D World

Jiangyong Huang, Silong Yong, Xiaojian Ma et al.

ICML 2024poster

AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation

Yangchao Wu, Tian Yu Liu, Hyoungseob Park et al.

ECCV 2024posterarXiv:2310.09739
15
citations

ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images

Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou et al.

ECCV 2024posterarXiv:2408.17027
8
citations

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding

Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.

ECCV 2024posterarXiv:2407.09781
11
citations

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion

Bohan Li, Jiajun Deng, Wenyao Zhang et al.

ECCV 2024posterarXiv:2407.02077
31
citations

Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

Wen Yuan Zhang, Kanle Shi, Yushen Liu et al.

ECCV 2024poster

M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions

Mingsheng Li, Xin Chen, Chi Zhang et al.

ECCV 2024poster
4
citations

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang, Xiaoyang Wu, Xi Chen et al.

ECCV 2024posterarXiv:2309.00616
82
citations

Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

Pengfei Wang, Yuxi Wang, Shuai Li et al.

ECCV 2024posterarXiv:2407.13362
10
citations

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

Xiaoyu Zhu, Hao Zhou, Pengfei Xing et al.

ECCV 2024posterarXiv:2407.13642
11
citations

Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation

Chih-Jung Tsai, Hwann-Tzong Chen, Tyng-Luh Liu

ECCV 2024poster

SegPoint: Segment Any Point Cloud via Large Language Model

Shuting He, Henghui Ding, Xudong Jiang et al.

ECCV 2024posterarXiv:2407.13761
37
citations