2025 "scene understanding" Papers
65 papers found • Page 1 of 2
2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update
Jeongyun Kim, Seunghoon Jeong, Giseop Kim et al.
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
Jiajun Deng, Tianyu He, Li Jiang et al.
ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation
Ze Yang, Shichao Dong, Ruibo Li et al.
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen, Bingchen Zhao, Yilun Chen et al.
A Dataset for Semantic Segmentation in the Presence of Unknowns
Zakaria Laskar, Tomas Vojir, Matej Grcic et al.
AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models
Xinyi Wang, Xun Yang, Yanlong Xu et al.
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
YUEJIAO SU, Yi Wang, Qiongyang Hu et al.
Auto-Vocabulary Semantic Segmentation
Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
Qingmei Li, Yang Zhang, Zurong Mai et al.
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
Lei-lei Li, Jianwu Fang, Junbin Xiao et al.
CG-SSL: Concept-Guided Self-Supervised Learning
Sara Atito, Josef Kittler, Imran Razzak et al.
Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting
Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step
Zheyuan Liu, Munan Ning, Qihui Zhang et al.
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
Henry Zheng, Hao Shi, Qihang Peng et al.
Diffusion Classifiers Understand Compositionality, but Conditions Apply
Yujin Jeong, Arnas Uselis, Seong Joon Oh et al.
Diffusion Models for Attribution
Xiongren Chen, Jiuyong Li, Jixue Liu et al.
Distilling Multi-modal Large Language Models for Autonomous Driving
Deepti Hegde, Rajeev Yasarla, Hong Cai et al.
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
Xiaosong Jia, Junqi You, Zhiyuan Zhang et al.
DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation
HAN SUN, Rui Gong, Ismail Nejjar et al.
Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation
Andrea Simonelli, Norman Müller, Peter Kontschieder
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu, Haoyu Liu, Hongming Fu et al.
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl H. Johansson et al.
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling
Tianyi Tan, Yinan Zheng, Ruiming Liang et al.
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan, Ajay Bati, Sangmin Lee et al.
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.
GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding
Zijun Lin, Shuting He, Cheston Tan et al.
HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction
Sara Rojas Martinez, Matthieu Armando, Bernard Ghanem et al.
HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos
Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta
HouseLayout3D: A Benchmark and Training-free Baseline for 3D Layout Estimation in the Wild
Valentin Bieri, Marie-Julie Rakotosaona, Keisuke Tateno et al.
HumorDB: Can AI understand graphical humor?
Vedaant V Jain, Gabriel Kreiman, Felipe Feitosa
HUMOTO: A 4D Dataset of Mocap Human Object Interactions
Jiaxin Lu, Chun-Hao Huang, Uttaran Bhattacharya et al.
Is Tracking really more challenging in First Person Egocentric Vision?
Matteo Dunnhofer, Zaira Manigrasso, Christian Micheloni
Knowledge Transfer from Interaction Learning
Yilin Gao, Kangyi Chen, Zhongxing Peng et al.
LACONIC: A 3D Layout Adapter for Controllable Image Creation
Léopold Maillard, Tom Durand, Adrien RAMANANA RAHARY et al.
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Junho Kim, Gwangtak Bae, Eun Sun Lee et al.
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World
Ankit Dhiman, Manan Shah, R. Venkatesh Babu
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.
MMCSBench: A Fine-Grained Benchmark for Large Vision-Language Models in Camouflage Scenes
Jin Zhang, Ruiheng Zhang, Zhe Cao et al.
mmWalk: Towards Multi-modal Multi-view Walking Assistance
Kedi Ying, Ruiping Liu, Chongyan Chen et al.
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang, XINGYU FU, James Y. Huang et al.
Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration
Zhitao Zeng, Guojian Yuan, Junyuan Mao et al.
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
Haoyu Guo, He Zhu, Sida Peng et al.
ObjectMover: Generative Object Movement with Video Prior
Xin Yu, Tianyu Wang, Soo Ye Kim et al.
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Nikos Dimitriadis, Pascal Frossard, François Fleuret
PolarFree: Polarization-based Reflection-Free Imaging
Mingde Yao, Menglu Wang, King Man Tam et al.
Prioritizing Perception-Guided Self-Supervision: A New Paradigm for Causal Modeling in End-to-End Autonomous Driving
Yi Huang, Zhan Qu, Lihui Jiang et al.
Promptable 3-D Object Localization with Latent Diffusion Models
Cheng-Yao Hong, Li-Heng Wang, Tyng-Luh Liu