Poster "spatial reasoning" Papers
15 papers found
ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning
Zhongyi Zhou, Yichen Zhu, Xiaoyu Liu et al.
NeurIPS 2025poster
Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning
Xingjian Ran, Yixuan Li, Linning Xu et al.
NeurIPS 2025posterarXiv:2506.05341
5
citations
Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
Jitesh Jain, Zhengyuan Yang, Humphrey Shi et al.
NeurIPS 2025posterarXiv:2412.09585
4
citations
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.
NeurIPS 2025posterarXiv:2506.21656
3
citations
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes
Tianxu Wang, Zhuofan Zhang, Ziyu Zhu et al.
NeurIPS 2025posterarXiv:2506.04897
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng, shijia Huang, Yanyang Li et al.
NeurIPS 2025posterarXiv:2505.24625
24
citations
Locality Alignment Improves Vision-Language Models
Ian Covert, Tony Sun, James Y Zou et al.
ICLR 2025posterarXiv:2410.11087
Re-Thinking Inverse Graphics With Large Language Models
Haiwen Feng, Michael J Black, Weiyang Liu et al.
ICLR 2025posterarXiv:2404.15228
15
citations
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
NeurIPS 2025posterarXiv:2506.00070
9
citations
Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding
Yanglin Feng, Hongyuan Zhu, Dezhong Peng et al.
NeurIPS 2025poster
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model
Yue Zhang, Zhiyang Xu, Ying Shen et al.
ICLR 2025posterarXiv:2410.03878
19
citations
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Junyi Chen, Di Huang, Weicai Ye et al.
ICLR 2025posterarXiv:2410.18962
4
citations
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.
ECCV 2024posterarXiv:2404.01197
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany, Fei Xia, Wenhao Yu et al.
ICML 2024poster
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
Agneet Chatterjee, Yiran Luo, Tejas Gokhale et al.
ECCV 2024posterarXiv:2408.02231
10
citations