Yong Man Ro
6
Papers
21
Total Citations
Papers (6)
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
CVPR 2024
16
citations
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
CVPR 2025arXiv
5
citations
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
CVPR 2025
0
citations
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
ICCV 2025
0
citations
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
AAAI 2025
0
citations
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
CVPR 2024
0
citations