Zilong Huang
9
Papers
80
Total Citations
Papers (9)
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
CVPR 2025
38
citations
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
ICCV 2025arXiv
22
citations
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
ICCV 2025
20
citations
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
ICCV 2025
0
citations
BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception
NeurIPS 2025arXiv
0
citations
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
ICLR 2025arXiv
0
citations
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
CVPR 2025
0
citations
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
CVPR 2025arXiv
0
citations
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
CVPR 2024
0
citations