Pengxiang Ding

9

Papers

242

Total Citations

Papers (9)

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

Expressive Forecasting of 3D Whole-Body Human Motions

Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport