Shusheng Yang

7

Papers

352

Total Citations

Papers (7)

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

MobileInst: Video Instance Segmentation on the Mobile

Temporally Efficient Vision Transformer for Video Instance Segmentation

RILS: Masked Visual Reconstruction in Language Semantic Space

Instances As Queries

Crossover Learning for Fast Online Video Instance Segmentation

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection