Poster "scene understanding" Papers
87 papers found • Page 2 of 2
Conference
Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers
An Lun Liu, Yu-Wei Chao, Yi-Ting Chen
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley, Peisen Zhou, Alekh Ashok et al.
TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving
Yanping Fu, Xinyuan Liu, Tianyu Li et al.
Towards Efficient Foundation Model for Zero-shot Amodal Segmentation
Zhaochen Liu, Limeng Qiao, Xiangxiang Chu et al.
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA
Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Anand Bhattad, Konpat Preechakul, Alexei Efros
3D Small Object Detection with Dynamic Spatial Pruning
Xiuwei Xu, Zhihao Sun, Ziwei Wang et al.
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
Wei Chen, Long Chen, Yu Wu
CarFormer: Self-Driving with Learned Object-Centric Representations
Shadi Hamdan, Fatma Guney
CLEO: Continual Learning of Evolving Ontologies
Shishir Muralidhara, Saqib Bukhari, Georg Dr. Schneider et al.
CountFormer: Multi-View Crowd Counting Transformer
Hong Mo, Xiong Zhang, Jianchao Tan et al.
DeTra: A Unified Model for Object Detection and Trajectory Forecasting
Sergio Casas, Ben T Agro, Jiageng Mao et al.
EgoGen: An Egocentric Synthetic Data Generator
Gen Li, Kaifeng Zhao, Siwei Zhang et al.
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar, Arya Bakhtiar, Danny L Tran et al.
FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
Florian Langer, Jihong Ju, Georgi Dikov et al.
General Geometry-aware Weakly Supervised 3D Object Detection
Guowen Zhang, Junsong Fan, Liyi Chen et al.
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Kai Zhang, Sai Bi, Hao Tan et al.
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Hyeongjun Kwon, Jinhyun Jang, Jin Kim et al.
Language Model Guided Interpretable Video Action Reasoning
Ning Wang, Guangming Zhu, Hongsheng Li et al.
Making Large Language Models Better Planners with Reasoning-Decision Alignment
Zhijian Huang, Tao Tang, Shaoxiang Chen et al.
MC-PanDA: Mask Confidence for Panoptic Domain Adaptation
Ivan Martinovic, Josip Šarić, Siniša Šegvić
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
Baijiong Lin, Weisen Jiang, Pengguang Chen et al.
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad, Sergey Zakharov, Vitor Guizilini et al.
Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
Yonggan Fu, Huaizhi Qu, Zhifan Ye et al.
OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.
Open Panoramic Segmentation
Junwei Zheng, Ruiping Liu, Yufan Chen et al.
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
Zijian Zhou, Zheng Zhu, Holger Caesar et al.
PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion
Runsong Zhu, Shi Qiu, Qianyi Wu et al.
Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes
Zelong Zeng, Kaname Tomite
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Bingxin Ke, Anton Obukhov, Shengyu Huang et al.
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
Junjie Zhang, Chenjia Bai, Haoran He et al.
SAM-guided Graph Cut for 3D Instance Segmentation
Haoyu Guo, He Zhu, Sida Peng et al.
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Tim Salzmann, Markus Ryll, Alex Bewley et al.
Self-Training Room Layout via Geometry-aware Ray-casting
Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang et al.
Training-Free Model Merging for Multi-target Domain Adaptation
Wenyi Li, Huan-ang Gao, Mingju Gao et al.
TrajPrompt: Aligning Color Trajectory with Vision-Language Representations
Li-Wu Tsao, Hao-Tang Tsui, Yu-Rou Tuan et al.
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
Seokha Moon, Hyun Woo, Hongbeen Park et al.