2025 "scene understanding" Papers

65 papers found • Page 1 of 2

2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update

Jeongyun Kim, Seunghoon Jeong, Giseop Kim et al.

ICCV 2025posterarXiv:2507.11069
1
citations

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jiajun Deng, Tianyu He, Li Jiang et al.

CVPR 2025posterarXiv:2501.01163
39
citations

ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation

Ze Yang, Shichao Dong, Ruibo Li et al.

ICLR 2025poster

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning

Xin Wen, Bingchen Zhao, Yilun Chen et al.

CVPR 2025posterarXiv:2503.06960
4
citations

A Dataset for Semantic Segmentation in the Presence of Unknowns

Zakaria Laskar, Tomas Vojir, Matej Grcic et al.

CVPR 2025posterarXiv:2503.22309

AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

Xinyi Wang, Xun Yang, Yanlong Xu et al.

NEURIPS 2025posterarXiv:2511.10017
1
citations

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction

YUEJIAO SU, Yi Wang, Qiongyang Hu et al.

CVPR 2025posterarXiv:2504.01472
4
citations

Auto-Vocabulary Semantic Segmentation

Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.

ICCV 2025posterarXiv:2312.04539
4
citations

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Qingmei Li, Yang Zhang, Zurong Mai et al.

NEURIPS 2025posterarXiv:2505.12207
1
citations

Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

Lei-lei Li, Jianwu Fang, Junbin Xiao et al.

ICCV 2025posterarXiv:2506.23263

CG-SSL: Concept-Guided Self-Supervised Learning

Sara Atito, Josef Kittler, Imran Razzak et al.

NEURIPS 2025poster

Contact-Aware Amodal Completion for Human-Object Interaction via Multi-Regional Inpainting

Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva et al.

ICCV 2025highlightarXiv:2508.00427
2
citations

CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step

Zheyuan Liu, Munan Ning, Qihui Zhang et al.

NEURIPS 2025posterarXiv:2507.04451
4
citations

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng, Hao Shi, Qihang Peng et al.

ICLR 2025posterarXiv:2505.04965
8
citations

Diffusion Classifiers Understand Compositionality, but Conditions Apply

Yujin Jeong, Arnas Uselis, Seong Joon Oh et al.

NEURIPS 2025posterarXiv:2505.17955
3
citations

Diffusion Models for Attribution

Xiongren Chen, Jiuyong Li, Jixue Liu et al.

AAAI 2025paperarXiv:2403.14790
12
citations

Distilling Multi-modal Large Language Models for Autonomous Driving

Deepti Hegde, Rajeev Yasarla, Hong Cai et al.

CVPR 2025posterarXiv:2501.09757
27
citations

DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving

Xiaosong Jia, Junqi You, Zhiyuan Zhang et al.

ICLR 2025oralarXiv:2503.07656
67
citations

DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation

HAN SUN, Rui Gong, Ismail Nejjar et al.

ICLR 2025posterarXiv:2501.16410
1
citations

Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation

Andrea Simonelli, Norman Müller, Peter Kontschieder

ICCV 2025posterarXiv:2504.11024
1
citations

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

Guanyao Wu, Haoyu Liu, Hongming Fu et al.

CVPR 2025posterarXiv:2503.01210
26
citations

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl H. Johansson et al.

ICLR 2025posterarXiv:2407.15589
12
citations

Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling

Tianyi Tan, Yinan Zheng, Ruiming Liang et al.

NEURIPS 2025oralarXiv:2510.11083
5
citations

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.

NEURIPS 2025posterarXiv:2506.04897

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan, Ajay Bati, Sangmin Lee et al.

CVPR 2025highlightarXiv:2412.09586
19
citations

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Muhammad Danish, Muhammad Akhtar Munir, Syed Shah et al.

ICCV 2025highlightarXiv:2411.19325
25
citations

GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding

Zijun Lin, Shuting He, Cheston Tan et al.

ICCV 2025posterarXiv:2506.21188
2
citations

HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction

Sara Rojas Martinez, Matthieu Armando, Bernard Ghanem et al.

ICCV 2025posterarXiv:2508.16433
4
citations

HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos

Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta

ICCV 2025posterarXiv:2505.12911

HouseLayout3D: A Benchmark and Training-free Baseline for 3D Layout Estimation in the Wild

Valentin Bieri, Marie-Julie Rakotosaona, Keisuke Tateno et al.

NEURIPS 2025posterarXiv:2512.02450

HumorDB: Can AI understand graphical humor?

Vedaant V Jain, Gabriel Kreiman, Felipe Feitosa

ICCV 2025posterarXiv:2406.13564
1
citations

HUMOTO: A 4D Dataset of Mocap Human Object Interactions

Jiaxin Lu, Chun-Hao Huang, Uttaran Bhattacharya et al.

ICCV 2025posterarXiv:2504.10414
6
citations

Is Tracking really more challenging in First Person Egocentric Vision?

Matteo Dunnhofer, Zaira Manigrasso, Christian Micheloni

ICCV 2025highlightarXiv:2507.16015

Knowledge Transfer from Interaction Learning

Yilin Gao, Kangyi Chen, Zhongxing Peng et al.

ICCV 2025posterarXiv:2509.18733

LACONIC: A 3D Layout Adapter for Controllable Image Creation

Léopold Maillard, Tom Durand, Adrien RAMANANA RAHARY et al.

ICCV 2025posterarXiv:2507.03257

Learning 3D Scene Analogies with Neural Contextual Scene Maps

Junho Kim, Gwangtak Bae, Eun Sun Lee et al.

ICCV 2025posterarXiv:2503.15897
1
citations

MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World

Ankit Dhiman, Manan Shah, R. Venkatesh Babu

CVPR 2025posterarXiv:2504.15397
1
citations

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Jiaxin Huang, Runnan Chen, Ziwen Li et al.

NEURIPS 2025posterarXiv:2503.18135
8
citations

MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.

NEURIPS 2025posterarXiv:2506.01946
17
citations

MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios

Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.

AAAI 2025paperarXiv:2409.16084
11
citations

MMCSBench: A Fine-Grained Benchmark for Large Vision-Language Models in Camouflage Scenes

Jin Zhang, Ruiheng Zhang, Zhe Cao et al.

NEURIPS 2025poster

mmWalk: Towards Multi-modal Multi-view Walking Assistance

Kedi Ying, Ruiping Liu, Chongyan Chen et al.

NEURIPS 2025posterarXiv:2510.11520

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Fei Wang, XINGYU FU, James Y. Huang et al.

ICLR 2025oralarXiv:2406.09411
113
citations

Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration

Zhitao Zeng, Guojian Yuan, Junyuan Mao et al.

NEURIPS 2025oralarXiv:2509.17429

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

Haoyu Guo, He Zhu, Sida Peng et al.

CVPR 2025posterarXiv:2503.14483
11
citations

ObjectMover: Generative Object Movement with Video Prior

Xin Yu, Tianyu Wang, Soo Ye Kim et al.

CVPR 2025posterarXiv:2503.08037
10
citations

Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences

Nikos Dimitriadis, Pascal Frossard, François Fleuret

ICLR 2025posterarXiv:2407.08056
9
citations

PolarFree: Polarization-based Reflection-Free Imaging

Mingde Yao, Menglu Wang, King Man Tam et al.

CVPR 2025posterarXiv:2503.18055
4
citations

Prioritizing Perception-Guided Self-Supervision: A New Paradigm for Causal Modeling in End-to-End Autonomous Driving

Yi Huang, Zhan Qu, Lihui Jiang et al.

NEURIPS 2025posterarXiv:2511.08214
1
citations

Promptable 3-D Object Localization with Latent Diffusion Models

Cheng-Yao Hong, Li-Heng Wang, Tyng-Luh Liu

NEURIPS 2025poster
PreviousNext