"spatial reasoning" Papers

36 papers found

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

Junli Liu, Qizhi Chen, Zhigang Wang et al.

ICCV 2025arXiv:2504.07836

citations

An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance

Hsuan-Kung Yang, Tsu-Ching Hsiao, Ryoichiro Oka et al.

ISMAR 2025paperarXiv:2508.16602

citations

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Siyu Wang, Cailian Chen, Xinyi Le et al.

AAAI 2025paperarXiv:2412.19663

citations

Can Large Language Models Integrate Spatial Data? Empirical Insights into Reasoning Strengths and Computational Weaknesses

Bin HAN, Robert Wolfe, Anat Caspi et al.

COLM 2025paperarXiv:2508.05009

citations

CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.

ICCV 2025arXiv:2504.15485

citations

ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning

Zhongyi Zhou, Yichen Zhu, Xiaoyu Liu et al.

NEURIPS 2025

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Xinhao Liu, Jintong Li, Yicheng Jiang et al.

CVPR 2025arXiv:2411.17820

citations

Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning

Xingjian Ran, Yixuan Li, Linning Xu et al.

NEURIPS 2025arXiv:2506.05341

citations

Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation

Jitesh Jain, Zhengyuan Yang, Humphrey Shi et al.

NEURIPS 2025arXiv:2412.09585

citations

Factorio Learning Environment

Jack Hopkins, Mart Bakler, Akbir Khan

NEURIPS 2025arXiv:2503.09617

citations

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.

NEURIPS 2025arXiv:2506.21656

citations

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.

NEURIPS 2025arXiv:2506.04897

citations

Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description

Mahmoud Ahmed, Junjie Fei, Jian Ding et al.

ICCV 2025arXiv:2405.18937

citations

Knot So Simple: A Minimalistic Environment for Spatial Reasoning

Zizhao Chen, Yoav Artzi

NEURIPS 2025arXiv:2505.18028

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Duo Zheng, shijia Huang, Yanyang Li et al.

NEURIPS 2025arXiv:2505.24625

citations

Locality Alignment Improves Vision-Language Models

Ian Covert, Tony Sun, James Y Zou et al.

ICLR 2025arXiv:2410.11087

citations

ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints

Rui Xu, Dakuan Lu, Zicheng Zhao et al.

NEURIPS 2025spotlightarXiv:2511.18450

citations

Re-Thinking Inverse Graphics With Large Language Models

Haiwen Feng, Michael J Black, Weiyang Liu et al.

ICLR 2025arXiv:2404.15228

citations

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

Chan Hee Song, Valts Blukis, Jonathan Tremblay et al.

CVPR 2025arXiv:2411.16537

citations

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Dongyoung Kim, Huiwon Jang, Sumin Park et al.

NEURIPS 2025arXiv:2506.00070

citations

Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding

Yanglin Feng, Hongyuan Zhu, Dezhong Peng et al.

NEURIPS 2025

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

Arijit Ray, Jiafei Duan, Ellis L Brown II et al.

COLM 2025paperarXiv:2412.07755

citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Hongbo Liu, Jingwen He, Yi Jin et al.

NEURIPS 2025arXiv:2506.21356

citations

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Zekun Qi, Wenyao Zhang, Yufei Ding et al.

NEURIPS 2025spotlightarXiv:2502.13143

citations

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Yue Zhang, Zhiyang Xu, Ying Shen et al.

ICLR 2025arXiv:2410.03878

citations

Spatially-aware Weights Tokenization for NeRF-Language Models

Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.

NEURIPS 2025

Spatial Understanding from Videos: Structured Prompts Meet Simulation Data

Haoyu Zhang, Meng Liu, Zaijing Li et al.

NEURIPS 2025spotlightarXiv:2506.03642

citations

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Yang Liu, Ming Ma, Xiaomin Yu et al.

NEURIPS 2025arXiv:2505.12448

citations

Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs

Fangrui Zhu, Hanhui Wang, Yiming Xie et al.

NEURIPS 2025arXiv:2506.04220

Visual Agentic AI for Spatial Reasoning with a Dynamic API

Damiano Marsili, Rohun Agrawal, Yisong Yue et al.

CVPR 2025arXiv:2502.06787

citations

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction

Junyi Chen, Di Huang, Weicai Ye et al.

ICLR 2025arXiv:2410.18962

citations

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Fangjun Li, David C. Hogg, Anthony G. Cohn

AAAI 2024paperarXiv:2401.03991

citations

Generating Human Motion in 3D Scenes from Text Descriptions

Zhi Cen, Huaijin Pi, Sida Peng et al.

CVPR 2024arXiv:2405.07784

citations

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.

ECCV 2024arXiv:2404.01197

citations

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu et al.

ICML 2024arXiv:2402.07872

188

citations

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models

Agneet Chatterjee, Yiran Luo, Tejas Gokhale et al.

ECCV 2024arXiv:2408.02231

citations