Zilong Huang

8

Papers

90

Total Citations

Papers (8)

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

NeurIPS 2025arXiv

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos