2025 "vision-language-action models" Papers
14 papers found
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
Xueyang Zhou, Guiyao Tie, Guowen Zhang et al.
NEURIPS 2025posterarXiv:2505.16640
11
citations
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models
Peiyan Li, Yixiang Chen, Hongtao Wu et al.
NEURIPS 2025posterarXiv:2506.07961
27
citations
ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning
Zhongyi Zhou, Yichen Zhu, Xiaoyu Liu et al.
NEURIPS 2025poster
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models
Yantai Yang, Yuhao Wang, Zichen Wen et al.
NEURIPS 2025oralarXiv:2506.10100
31
citations
Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization
Jiaming Zhou, Ke Ye, Jiayi Liu et al.
NEURIPS 2025posterarXiv:2505.15660
16
citations
FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation
Cui Miao, Tao Chang, meihan wu et al.
ICCV 2025posterarXiv:2508.02190
5
citations
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Danny Driess, Jost Springenberg, Brian Ichter et al.
NEURIPS 2025spotlightarXiv:2505.23705
46
citations
Latent Action Pretraining from Videos
Seonghyeon Ye, Joel Jang, Byeongguk Jeon et al.
ICLR 2025posterarXiv:2410.11758
Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control
Seongmin Park, Hyungmin Kim, Sangwoo kim et al.
ICCV 2025posterarXiv:2505.15304
1
citations
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning
Xiangyu Wang, Donglin Yang, Yue Liao et al.
NEURIPS 2025posterarXiv:2505.15725
9
citations
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Yichao Shen, Fangyun Wei, Zhiying Du et al.
NEURIPS 2025posterarXiv:2512.06963
3
citations
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
Siyu Xu, Yunke Wang, Chenghao Xia et al.
NEURIPS 2025oralarXiv:2502.02175
27
citations
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
Chongkai Gao, Zixuan Liu, Zhenghao Chi et al.
NEURIPS 2025posterarXiv:2506.17561
8
citations
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation
Wei Zhao, Pengxiang Ding, Zhang Min et al.
ICLR 2025posterarXiv:2502.13508
37
citations