Stan Weixian Lei

7

Papers

123

Total Citations

Papers (7)

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

ViT-Lens: Towards Omni-modal Representations

Generic Event Boundary Detection: A Benchmark for Event Segmentation

Too Large; Data Reduction for Vision-Language Pre-Training

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Learning to Learn: How to Continuously Teach Humans and Machines

AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant