Yong Man Ro

6

Papers

21

Total Citations

Papers (6)

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection