Biao Gong

17

Papers

307

Total Citations

Papers (17)

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Mimir: Improving Video Diffusion Models for Precise Text Understanding

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

Learning Visual Generative Priors without Text

Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

DreamRelation: Relation-Centric Video Customization

VoP: Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval

ViM: Vision Middleware for Unified Downstream Transferring

ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos