Poster "scene understanding" Papers
46 papers found
2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update
Jeongyun Kim, Seunghoon Jeong, Giseop Kim et al.
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
Jiajun Deng, Tianyu He, Li Jiang et al.
ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation
Ze Yang, Shichao Dong, Ruibo Li et al.
A Dataset for Semantic Segmentation in the Presence of Unknowns
Zakaria Laskar, Tomas Vojir, Matej Grcic et al.
Auto-Vocabulary Semantic Segmentation
Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
Lei-lei Li, Jianwu Fang, Junbin Xiao et al.
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
Henry Zheng, Hao Shi, Qihang Peng et al.
Distilling Multi-modal Large Language Models for Autonomous Driving
Deepti Hegde, Rajeev Yasarla, Hong Cai et al.
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl H. Johansson et al.
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.
HouseLayout3D: A Benchmark and Training-free Baseline for 3D Layout Estimation in the Wild
Valentin Bieri, Marie-Julie Rakotosaona, Keisuke Tateno et al.
HUMOTO: A 4D Dataset of Mocap Human Object Interactions
Jiaxin Lu, Chun-Hao Huang, Uttaran Bhattacharya et al.
Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng et al.
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Junho Kim, Gwangtak Bae, Eun Sun Lee et al.
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World
Ankit Dhiman, Manan Shah, R. Venkatesh Babu
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.
mmWalk: Towards Multi-modal Multi-view Walking Assistance
Kedi Ying, Ruiping Liu, Chongyan Chen et al.
PolarFree: Polarization-based Reflection-Free Imaging
Mingde Yao, Menglu Wang, King Man Tam et al.
Prioritizing Perception-Guided Self-Supervision: A New Paradigm for Causal Modeling in End-to-End Autonomous Driving
Yi Huang, Zhan Qu, Lihui Jiang et al.
Promptable 3-D Object Localization with Latent Diffusion Models
Cheng-Yao Hong, Li-Heng Wang, Tyng-Luh Liu
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li, Qi Ma, Runyi Yang et al.
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu, Meng Lou, Yizhou Yu
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong Liu, Song-Li Wu, Sule Bai et al.
Supercharging Floorplan Localization with Semantic Rays
Yuval Grader, Hadar Averbuch-Elor
Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers
An Lun Liu, Yu-Wei Chao, Yi-Ting Chen
TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving
Yanping Fu, Xinyuan Liu, Tianyu Li et al.
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Anand Bhattad, Konpat Preechakul, Alexei Efros
3D Small Object Detection with Dynamic Spatial Pruning
Xiuwei Xu, Zhihao Sun, Ziwei Wang et al.
CarFormer: Self-Driving with Learned Object-Centric Representations
Shadi Hamdan, Fatma Guney
CountFormer: Multi-View Crowd Counting Transformer
Hong Mo, Xiong Zhang, Jianchao Tan et al.
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar, Arya Bakhtiar, Danny L Tran et al.
FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
Florian Langer, Jihong Ju, Georgi Dikov et al.
General Geometry-aware Weakly Supervised 3D Object Detection
Guowen Zhang, Junsong Fan, Liyi Chen et al.
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Kai Zhang, Sai Bi, Hao Tan et al.
MC-PanDA: Mask Confidence for Panoptic Domain Adaptation
Ivan Martinovic, Josip Šarić, Siniša Šegvić
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
Baijiong Lin, Weisen Jiang, Pengguang Chen et al.
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad, Sergey Zakharov, Vitor Guizilini et al.
Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
Yonggan Fu, Huaizhi Qu, Zhifan Ye et al.
Open Panoramic Segmentation
Junwei Zheng, Ruiping Liu, Yufan Chen et al.
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
Zijian Zhou, Zheng Zhu, Holger Caesar et al.
Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes
Zelong Zeng, Kaname Tomite
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
Junjie Zhang, Chenjia Bai, Haoran He et al.
SAM-guided Graph Cut for 3D Instance Segmentation
Haoyu Guo, He Zhu, Sida Peng et al.
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Tim Salzmann, Markus Ryll, Alex Bewley et al.
Training-Free Model Merging for Multi-target Domain Adaptation
Wenyi Li, Huan-ang Gao, Mingju Gao et al.