David Harwath
10
Papers
314
Total Citations
Papers (10)
Unsupervised Learning of Spoken Language with Visual Context
NeurIPS 2016
256
citations
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
ICLR 2025arXiv
22
citations
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
ECCV 2024
19
citations
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
CVPR 2024
11
citations
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
ICCV 2025
6
citations
Learning Words by Drawing Images
CVPR 2019
0
citations
Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions
CVPR 2021arXiv
0
citations
Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval
CVPR 2022arXiv
0
citations
Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos
ICCV 2021arXiv
0
citations
BAT: Learning to Reason about Spatial Sounds with Large Language Models
ICML 2024
0
citations