Jiasen Lu

4

Papers

150

Total Citations

Papers (4)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

One Diffusion to Generate Them All

STIV: Scalable Text and Image Conditioned Video Generation

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action