Shentong Mo
16
Papers
42
Total Citations
Papers (16)
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
CVPR 2024
31
citations
Audio-visual Generalized Zero-shot Learning the Easy Way
ECCV 2024
7
citations
Scaling Diffusion Mamba with Bidirectional SSMs for Efficient 3D Shape Generation
AAAI 2025
3
citations
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
AAAI 2025
1
citations
Audio-Visual Class-Incremental Learning
ICCV 2023arXiv
0
citations
"Unitail: Detecting, Reading, and Matching in Retail Scene"
ECCV 2022
0
citations
Localizing Visual Sounds the Easy Way
ECCV 2022
0
citations
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
CVPR 2025
0
citations
GMAIL: Generative Modality Alignment for generated Image Learning
ICML 2025
0
citations
Audio-Visual Grouping Network for Sound Localization From Mixtures
CVPR 2023arXiv
0
citations
Class-Incremental Grouping Network for Continual Audio-Visual Learning
ICCV 2023arXiv
0
citations
Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
NeurIPS 2022
0
citations
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
NeurIPS 2022
0
citations
Weakly-Supervised Audio-Visual Segmentation
NeurIPS 2023
0
citations
DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
NeurIPS 2023
0
citations
DiffComplete: Diffusion-based Generative 3D Shape Completion
NeurIPS 2023
0
citations