"embodied ai" Papers

22 papers found

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.

CVPR 2025arXiv:2406.05132
30
citations

BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

Hangtao Zhang, Chenyu Zhu, Xianlong Wang et al.

ICLR 2025
6
citations

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Zun Wang, Jialu Li, Yicong Hong et al.

ICLR 2025arXiv:2412.08467
10
citations

CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization

Jan Ackermann, Jonas Kulhanek, Shengqu Cai et al.

ICCV 2025arXiv:2506.21117
6
citations

Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation

Zihan Wang, Seungjun Lee, Gim Hee Lee

NEURIPS 2025oralarXiv:2505.11383
7
citations

Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation

Jitesh Jain, Zhengyuan Yang, Humphrey Shi et al.

NEURIPS 2025arXiv:2412.09585
4
citations

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.

ICLR 2025arXiv:2408.11811
35
citations

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding

Yue Fan, Xiaojian Ma, Rongpeng Su et al.

ICCV 2025highlightarXiv:2501.00358
12
citations

ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks

Arth Shukla, Stone Tao, Hao Su

ICLR 2025arXiv:2412.13211
15
citations

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans

Huangyue Yu, Baoxiong Jia, Yixin Chen et al.

CVPR 2025arXiv:2505.02388
13
citations

MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory

Junyeong Park, Junmo Cho, Sungjin Ahn

ICLR 2025arXiv:2411.06736
5
citations

Reasoning in Visual Navigation of End-to-end Trained Agents: A Dynamical Systems Approach

Steeven JANNY, Hervé Poirier, Leonid Antsfeld et al.

CVPR 2025highlightarXiv:2503.08306
4
citations

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

Liuyi Wang, Xinyuan Xia, Hui Zhao et al.

ICCV 2025arXiv:2507.13019
7
citations

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Yandan Yang, Baoxiong Jia, Shujie Zhang et al.

NEURIPS 2025arXiv:2509.20414
11
citations

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Haoyi Zhu, Honghui Yang, Yating Wang et al.

ICLR 2025arXiv:2410.08208
24
citations

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Yun Li, Yiming Zhang, Tao Lin et al.

ICCV 2025arXiv:2503.23765
38
citations

TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer

Yang Liu, Chuanchen Luo, Zimo Tang et al.

NEURIPS 2025oralarXiv:2506.18904
5
citations

Training-Free Generation of Temporally Consistent Rewards from VLMs

Yinuo Zhao, Jiale Yuan, Zhiyuan Xu et al.

ICCV 2025arXiv:2507.04789
2
citations

An Embodied Generalist Agent in 3D World

Jiangyong Huang, Silong Yong, Xiaojian Ma et al.

ICML 2024arXiv:2311.12871
305
citations

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.

CVPR 2024arXiv:2312.16217
182
citations

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

Yao Mu, Junting Chen, Qing-Long Zhang et al.

ICML 2024arXiv:2402.16117
46
citations

Where am I? Scene Retrieval with Language

Jiaqi Chen, Daniel Barath, Iro Armeni et al.

ECCV 2024arXiv:2404.14565
14
citations