Yao Lu

21

Papers

953

Total Citations

Papers (21)

VILA: On Pre-training for Visual Language Models

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

WorldModelBench: Judging Video Generation Models As World Models

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference

Scaling Vision Pre-Training to 4K Resolution

Coherent Parametric Contours for Interactive Video Object Segmentation

Learning Optical Flow From a Few Matches

Taskology: Utilizing Task Relations at Scale

Token Turing Machines

Contour Flow: Middle-Level Motion Estimation by Combining Motion Segmentation and Contour Alignment

Learning To Estimate Hidden Motions With Global Motion Aggregation

Understanding the Dynamics of DNNs Using Graph Modularity

NVILA: Efficient Frontier Visual Language Models

RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models

A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers

ALRMR-GEC: Adjusting Learning Rate Based on Memory Rate to Optimize the Edit Scorer for Grammatical Error Correction

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents