Poster "scene understanding" Papers

87 papers found • Page 2 of 2

Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers

An Lun Liu, Yu-Wei Chao, Yi-Ting Chen

ICCV 2025arXiv:2507.11287

The 3D-PC: a benchmark for visual perspective taking in humans and machines

Drew Linsley, Peisen Zhou, Alekh Ashok et al.

ICLR 2025arXiv:2406.04138
10
citations

TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving

Yanping Fu, Xinyuan Liu, Tianyu Li et al.

NEURIPS 2025arXiv:2505.17771
4
citations

Towards Efficient Foundation Model for Zero-shot Amodal Segmentation

Zhaochen Liu, Limeng Qiao, Xiangxiang Chu et al.

CVPR 2025
3
citations

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.

ICCV 2025arXiv:2503.10225
3
citations

Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting

Anand Bhattad, Konpat Preechakul, Alexei Efros

NEURIPS 2025arXiv:2503.21770
8
citations

3D Small Object Detection with Dynamic Spatial Pruning

Xiuwei Xu, Zhihao Sun, Ziwei Wang et al.

ECCV 2024arXiv:2305.03716
9
citations

An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding

Wei Chen, Long Chen, Yu Wu

ECCV 2024arXiv:2408.01120
17
citations

CarFormer: Self-Driving with Learned Object-Centric Representations

Shadi Hamdan, Fatma Guney

ECCV 2024arXiv:2407.15843
12
citations

CLEO: Continual Learning of Evolving Ontologies

Shishir Muralidhara, Saqib Bukhari, Georg Dr. Schneider et al.

ECCV 2024arXiv:2407.08411
3
citations

CountFormer: Multi-View Crowd Counting Transformer

Hong Mo, Xiong Zhang, Jianchao Tan et al.

ECCV 2024arXiv:2407.02047
9
citations

DeTra: A Unified Model for Object Detection and Trajectory Forecasting

Sergio Casas, Ben T Agro, Jiageng Mao et al.

ECCV 2024arXiv:2406.04426
12
citations

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang et al.

CVPR 2024arXiv:2401.08739
24
citations

EgoPet: Egomotion and Interaction Data from an Animal's Perspective

Amir Bar, Arya Bakhtiar, Danny L Tran et al.

ECCV 2024arXiv:2404.09991
11
citations

FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

Florian Langer, Jihong Ju, Georgi Dikov et al.

ECCV 2024arXiv:2403.15161
6
citations

General Geometry-aware Weakly Supervised 3D Object Detection

Guowen Zhang, Junsong Fan, Liyi Chen et al.

ECCV 2024arXiv:2407.13748
5
citations

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Kai Zhang, Sai Bi, Hao Tan et al.

ECCV 2024arXiv:2404.19702
251
citations

Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping

Hyeongjun Kwon, Jinhyun Jang, Jin Kim et al.

CVPR 2024arXiv:2404.00974
10
citations

Language Model Guided Interpretable Video Action Reasoning

Ning Wang, Guangming Zhu, Hongsheng Li et al.

CVPR 2024arXiv:2404.01591
7
citations

Making Large Language Models Better Planners with Reasoning-Decision Alignment

Zhijian Huang, Tao Tang, Shaoxiang Chen et al.

ECCV 2024arXiv:2408.13890
40
citations

MC-PanDA: Mask Confidence for Panoptic Domain Adaptation

Ivan Martinovic, Josip Šarić, Siniša Šegvić

ECCV 2024arXiv:2407.14110
2
citations

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

Baijiong Lin, Weisen Jiang, Pengguang Chen et al.

ECCV 2024arXiv:2407.02228
29
citations

NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Muhammad Zubair Irshad, Sergey Zakharov, Vitor Guizilini et al.

ECCV 2024arXiv:2404.01300
22
citations

Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

Yonggan Fu, Huaizhi Qu, Zhifan Ye et al.

ECCV 2024arXiv:2403.11131

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.

CVPR 2024arXiv:2311.11666
68
citations

Open Panoramic Segmentation

Junwei Zheng, Ruiping Liu, Yufan Chen et al.

ECCV 2024arXiv:2407.02685
15
citations

OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models

Zijian Zhou, Zheng Zhu, Holger Caesar et al.

ECCV 2024arXiv:2407.11213
13
citations

PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion

Runsong Zhu, Shi Qiu, Qianyi Wu et al.

ECCV 2024arXiv:2410.10659
9
citations

Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes

Zelong Zeng, Kaname Tomite

ECCV 2024arXiv:2404.17961
1
citations

Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang et al.

CVPR 2024arXiv:2312.02145
332
citations

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

Junjie Zhang, Chenjia Bai, Haoran He et al.

ICML 2024arXiv:2405.19586
27
citations

SAM-guided Graph Cut for 3D Instance Segmentation

Haoyu Guo, He Zhu, Sida Peng et al.

ECCV 2024arXiv:2312.08372
32
citations

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection

Tim Salzmann, Markus Ryll, Alex Bewley et al.

ECCV 2024arXiv:2403.14270
8
citations

Self-Training Room Layout via Geometry-aware Ray-casting

Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang et al.

ECCV 2024
4
citations

Training-Free Model Merging for Multi-target Domain Adaptation

Wenyi Li, Huan-ang Gao, Mingju Gao et al.

ECCV 2024arXiv:2407.13771
12
citations

TrajPrompt: Aligning Color Trajectory with Vision-Language Representations

Li-Wu Tsao, Hao-Tang Tsui, Yu-Rou Tuan et al.

ECCV 2024

VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

Seokha Moon, Hyun Woo, Hongbeen Park et al.

ECCV 2024arXiv:2407.12345
22
citations