Zilong Huang

16

Papers

80

Total Citations

Papers (16)

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Weakly-Supervised Semantic Segmentation Network With Deep Seeded Region Growing

Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

Human De-Occlusion: Invisible Perception and Recovery for Humans

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Executing Your Commands via Motion Diffusion in Latent Space

Object-Level Proposals

CCNet: Criss-Cross Attention for Semantic Segmentation

SPGNet: Semantic Prediction Guidance for Scene Parsing

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration

QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations