Zhou Zhao

21
Papers
333
Total Citations

Papers (21)

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

ICLR 2025arXiv
125
citations

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

ICLR 2024
74
citations

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations

AAAI 2024arXiv
49
citations

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

ICML 2025
28
citations

TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching

AAAI 2025
16
citations

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

CVPR 2025arXiv
15
citations

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

ICLR 2025
10
citations

MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities

AAAI 2025
8
citations

FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation

CVPR 2025arXiv
5
citations

Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations

ICCV 2025
3
citations

Towards Transformer-Based Aligned Generation with Self-Coherence Guidance

CVPR 2025
0
citations

MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization

CVPR 2024
0
citations

Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders

CVPR 2025
0
citations

Dataflow-Guided Neuro-Symbolic Language Models for Type Inference

ICML 2025
0
citations

InstructSpeech: Following Speech Editing Instructions via Large Language Models

ICML 2024
0
citations

Non-confusing Generation of Customized Concepts in Diffusion Models

ICML 2024
0
citations

UniAudio: Towards Universal Audio Generation with Large Language Models

ICML 2024
0
citations

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

ICML 2024
0
citations

Open-set Cross Modal Generalization via Multimodal Unified Representation

ICCV 2025
0
citations

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language

CVPR 2025
0
citations

Speech Watermarking with Discrete Intermediate Representations

AAAI 2025
0
citations