Yao Lu

11

Papers

953

Total Citations

Papers (11)

VILA: On Pre-training for Visual Language Models

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

WorldModelBench: Judging Video Generation Models As World Models

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference

NVILA: Efficient Frontier Visual Language Models

Scaling Vision Pre-Training to 4K Resolution

RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models

A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers

ALRMR-GEC: Adjusting Learning Rate Based on Memory Rate to Optimize the Edit Scorer for Grammatical Error Correction