Yiyuan Zhang
10
Papers
43
Total Citations
Papers (10)
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
CVPR 2024
28
citations
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
CVPR 2024
11
citations
Breaking the Encoder Barrier for Seamless Video-Language Understanding
ICCV 2025
3
citations
MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
ICCV 2025
1
citations
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
CVPR 2024
0
citations
FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions
ICCV 2025
0
citations
Modality Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-identification
ECCV 2022
0
citations
Learning Beyond Still Frames: Scaling Vision-Language Models with Video
ICCV 2025
0
citations
Scaling Omni-modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities
ICCV 2025
0
citations
OneLLM: One Framework to Align All Modalities with Language
CVPR 2024
0
citations