Rui Shao

8

Papers

47

Total Citations

Papers (8)

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

Less is More: Empowering GUI Agent with Context-Aware Simplification