Zilong Huang

9

Papers

80

Total Citations

Papers (9)

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing

BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

NeurIPS 2025arXiv

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data