Zhou Zhao
21
Papers
333
Total Citations
Papers (21)
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
ICLR 2025arXiv
125
citations
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
ICLR 2024
74
citations
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
AAAI 2024arXiv
49
citations
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
ICML 2025
28
citations
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
AAAI 2025
16
citations
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
CVPR 2025arXiv
15
citations
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
ICLR 2025
10
citations
MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities
AAAI 2025
8
citations
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
CVPR 2025arXiv
5
citations
Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
ICCV 2025
3
citations
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
CVPR 2025
0
citations
MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization
CVPR 2024
0
citations
Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders
CVPR 2025
0
citations
Dataflow-Guided Neuro-Symbolic Language Models for Type Inference
ICML 2025
0
citations
InstructSpeech: Following Speech Editing Instructions via Large Language Models
ICML 2024
0
citations
Non-confusing Generation of Customized Concepts in Diffusion Models
ICML 2024
0
citations
UniAudio: Towards Universal Audio Generation with Large Language Models
ICML 2024
0
citations
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
ICML 2024
0
citations
Open-set Cross Modal Generalization via Multimodal Unified Representation
ICCV 2025
0
citations
SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
CVPR 2025
0
citations
Speech Watermarking with Discrete Intermediate Representations
AAAI 2025
0
citations