Youngjae Yu

21
Papers
10
Total Citations

Papers (21)

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

ICCV 2025
7
citations

ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO

AAAI 2025
3
citations

VAGUE: Visual Contexts Clarify Ambiguous Expressions

ICCV 2025
0
citations

Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation

NeurIPS 2025
0
citations

MASS: Overcoming Language Bias in Image-Text Matching

AAAI 2025
0
citations

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

AAAI 2025
0
citations

Supervising Neural Attention Models for Video Captioning by Human Gaze Data

CVPR 2017arXiv
0
citations

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering

CVPR 2017
0
citations

End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering

CVPR 2017arXiv
0
citations

Transitional Adaptation of Pretrained Models for Visual Storytelling

CVPR 2021
0
citations

MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound

CVPR 2022arXiv
0
citations

Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning

CVPR 2023
0
citations

Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos

ICCV 2021
0
citations

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning

ICCV 2021arXiv
0
citations

CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

ICCV 2023arXiv
0
citations

Character Grounding and Re-Identification in Story of Videos and Text Descriptions

ECCV 2020
0
citations

A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos

CVPR 2018
0
citations

V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models

ICCV 2025
0
citations

MERLOT: Multimodal Neural Script Knowledge Models

NeurIPS 2021
0
citations

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

NeurIPS 2023
0
citations

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

NeurIPS 2023
0
citations