2025 Poster "scene understanding" Papers

56 papers found • Page 1 of 2

2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update

Jeongyun Kim, Seunghoon Jeong, Giseop Kim et al.

ICCV 2025posterarXiv:2507.11069
1
citations

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

Jiajun Deng, Tianyu He, Li Jiang et al.

CVPR 2025posterarXiv:2501.01163
39
citations

ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation

Ze Yang, Shichao Dong, Ruibo Li et al.

ICLR 2025poster

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning

Xin Wen, Bingchen Zhao, Yilun Chen et al.

CVPR 2025posterarXiv:2503.06960
4
citations

A Dataset for Semantic Segmentation in the Presence of Unknowns

Zakaria Laskar, Tomas Vojir, Matej Grcic et al.

CVPR 2025posterarXiv:2503.22309

AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

Xinyi Wang, Xun Yang, Yanlong Xu et al.

NEURIPS 2025posterarXiv:2511.10017
1
citations

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction

YUEJIAO SU, Yi Wang, Qiongyang Hu et al.

CVPR 2025posterarXiv:2504.01472
4
citations

Auto-Vocabulary Semantic Segmentation

Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.

ICCV 2025posterarXiv:2312.04539
4
citations

Beyond Human Perception: Understanding Multi-Object World from Monocular View

Keyu Guo, Yongle Huang, Shijie Sun et al.

CVPR 2025poster
2
citations

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind

Qingmei Li, Yang Zhang, Zurong Mai et al.

NEURIPS 2025posterarXiv:2505.12207
1
citations

Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

Lei-lei Li, Jianwu Fang, Junbin Xiao et al.

ICCV 2025posterarXiv:2506.23263

CG-SSL: Concept-Guided Self-Supervised Learning

Sara Atito, Josef Kittler, Imran Razzak et al.

NEURIPS 2025poster

CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step

Zheyuan Liu, Munan Ning, Qihui Zhang et al.

NEURIPS 2025posterarXiv:2507.04451
4
citations

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng, Hao Shi, Qihang Peng et al.

ICLR 2025posterarXiv:2505.04965
8
citations

Diffusion Classifiers Understand Compositionality, but Conditions Apply

Yujin Jeong, Arnas Uselis, Seong Joon Oh et al.

NEURIPS 2025posterarXiv:2505.17955
3
citations

Distilling Multi-modal Large Language Models for Autonomous Driving

Deepti Hegde, Rajeev Yasarla, Hong Cai et al.

CVPR 2025posterarXiv:2501.09757
28
citations

DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation

HAN SUN, Rui Gong, Ismail Nejjar et al.

ICLR 2025posterarXiv:2501.16410
1
citations

Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation

Andrea Simonelli, Norman Müller, Peter Kontschieder

ICCV 2025posterarXiv:2504.11024
1
citations

Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

Guanyao Wu, Haoyu Liu, Hongming Fu et al.

CVPR 2025posterarXiv:2503.01210
26
citations

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl H. Johansson et al.

ICLR 2025posterarXiv:2407.15589
12
citations

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.

NEURIPS 2025posterarXiv:2506.04897
1
citations

GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding

Zijun Lin, Shuting He, Cheston Tan et al.

ICCV 2025posterarXiv:2506.21188
2
citations

HAMSt3R: Human-Aware Multi-view Stereo 3D Reconstruction

Sara Rojas Martinez, Matthieu Armando, Bernard Ghanem et al.

ICCV 2025posterarXiv:2508.16433
4
citations

HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos

Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta

ICCV 2025posterarXiv:2505.12911

HouseLayout3D: A Benchmark and Training-free Baseline for 3D Layout Estimation in the Wild

Valentin Bieri, Marie-Julie Rakotosaona, Keisuke Tateno et al.

NEURIPS 2025posterarXiv:2512.02450

HumorDB: Can AI understand graphical humor?

Vedaant V Jain, Gabriel Kreiman, Felipe Feitosa

ICCV 2025posterarXiv:2406.13564
1
citations

HUMOTO: A 4D Dataset of Mocap Human Object Interactions

Jiaxin Lu, Chun-Hao Huang, Uttaran Bhattacharya et al.

ICCV 2025posterarXiv:2504.10414
7
citations

Knowledge Transfer from Interaction Learning

Yilin Gao, Kangyi Chen, Zhongxing Peng et al.

ICCV 2025posterarXiv:2509.18733

LACONIC: A 3D Layout Adapter for Controllable Image Creation

Léopold Maillard, Tom Durand, Adrien RAMANANA RAHARY et al.

ICCV 2025posterarXiv:2507.03257

Learning 3D Scene Analogies with Neural Contextual Scene Maps

Junho Kim, Gwangtak Bae, Eun Sun Lee et al.

ICCV 2025posterarXiv:2503.15897
1
citations

MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World

Ankit Dhiman, Manan Shah, R. Venkatesh Babu

CVPR 2025posterarXiv:2504.15397
1
citations

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Jiaxin Huang, Runnan Chen, Ziwen Li et al.

NEURIPS 2025posterarXiv:2503.18135
9
citations

MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.

NEURIPS 2025posterarXiv:2506.01946
17
citations

MMCSBench: A Fine-Grained Benchmark for Large Vision-Language Models in Camouflage Scenes

Jin Zhang, Ruiheng Zhang, Zhe Cao et al.

NEURIPS 2025poster

mmWalk: Towards Multi-modal Multi-view Walking Assistance

Kedi Ying, Ruiping Liu, Chongyan Chen et al.

NEURIPS 2025posterarXiv:2510.11520

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

Haoyu Guo, He Zhu, Sida Peng et al.

CVPR 2025posterarXiv:2503.14483
11
citations

ObjectMover: Generative Object Movement with Video Prior

Xin Yu, Tianyu Wang, Soo Ye Kim et al.

CVPR 2025posterarXiv:2503.08037
10
citations

Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences

Nikos Dimitriadis, Pascal Frossard, François Fleuret

ICLR 2025posterarXiv:2407.08056
9
citations

PolarFree: Polarization-based Reflection-Free Imaging

Mingde Yao, Menglu Wang, King Man Tam et al.

CVPR 2025posterarXiv:2503.18055
4
citations

Prioritizing Perception-Guided Self-Supervision: A New Paradigm for Causal Modeling in End-to-End Autonomous Driving

Yi Huang, Zhan Qu, Lihui Jiang et al.

NEURIPS 2025posterarXiv:2511.08214
1
citations

Promptable 3-D Object Localization with Latent Diffusion Models

Cheng-Yao Hong, Li-Heng Wang, Tyng-Luh Liu

NEURIPS 2025poster

PSI: A Benchmark for Human Interpretation and Response in Traffic Interactions

TAOTAO JING, Tina Chen, Renran Tian et al.

NEURIPS 2025poster

Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation

Edward LOO, Jiacheng Deng

CVPR 2025posterarXiv:2506.17891
4
citations

RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving

Zhijian Huang, Chengjian Feng, Baihui Xiao et al.

ICCV 2025posterarXiv:2412.07689
11
citations

Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs

Bhavya Goyal, Felipe Gutierrez-Barragan, Wei Lin et al.

ICCV 2025posterarXiv:2508.00169
2
citations

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Yue Li, Qi Ma, Runyi Yang et al.

ICCV 2025posterarXiv:2503.18052
21
citations

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

Yunxiang Fu, Meng Lou, Yizhou Yu

CVPR 2025posterarXiv:2412.11890
22
citations

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Yong Liu, Song-Li Wu, Sule Bai et al.

ICCV 2025posterarXiv:2506.16058
2
citations

Supercharging Floorplan Localization with Semantic Rays

Yuval Grader, Hadar Averbuch-Elor

ICCV 2025posterarXiv:2507.09291
2
citations

TACO: Taming Diffusion for in-the-wild Video Amodal Completion

Ruijie Lu, Yixin Chen, Yu Liu et al.

ICCV 2025posterarXiv:2503.12049
10
citations
PreviousNext