"spatial reasoning" Papers
17 papers found
ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning
Zhongyi Zhou, Yichen Zhu, Xiaoyu Liu et al.
Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning
Xingjian Ran, Yixuan Li, Linning Xu et al.
Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
Jitesh Jain, Zhengyuan Yang, Humphrey Shi et al.
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng, shijia Huang, Yanyang Li et al.
Locality Alignment Improves Vision-Language Models
Ian Covert, Tony Sun, James Y Zou et al.
Re-Thinking Inverse Graphics With Large Language Models
Haiwen Feng, Michael J Black, Weiyang Liu et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding
Yanglin Feng, Hongyuan Zhu, Dezhong Peng et al.
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi, Wenyao Zhang, Yufei Ding et al.
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model
Yue Zhang, Zhiyang Xu, Ying Shen et al.
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Junyi Chen, Di Huang, Weicai Ye et al.
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Fangjun Li, David C. Hogg, Anthony G. Cohn
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany, Fei Xia, Wenhao Yu et al.
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
Agneet Chatterjee, Yiran Luo, Tejas Gokhale et al.