Youngjae Yu
21
Papers
10
Total Citations
Papers (21)
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
ICCV 2025
7
citations
ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO
AAAI 2025
3
citations
VAGUE: Visual Contexts Clarify Ambiguous Expressions
ICCV 2025
0
citations
Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation
NeurIPS 2025
0
citations
MASS: Overcoming Language Bias in Image-Text Matching
AAAI 2025
0
citations
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
AAAI 2025
0
citations
Supervising Neural Attention Models for Video Captioning by Human Gaze Data
CVPR 2017arXiv
0
citations
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
CVPR 2017
0
citations
End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering
CVPR 2017arXiv
0
citations
Transitional Adaptation of Pretrained Models for Visual Storytelling
CVPR 2021
0
citations
MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound
CVPR 2022arXiv
0
citations
Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning
CVPR 2023
0
citations
Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos
ICCV 2021
0
citations
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
ICCV 2021arXiv
0
citations
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
ICCV 2023arXiv
0
citations
Character Grounding and Re-Identification in Story of Videos and Text Descriptions
ECCV 2020
0
citations
A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos
CVPR 2018
0
citations
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
ICCV 2025
0
citations
MERLOT: Multimodal Neural Script Knowledge Models
NeurIPS 2021
0
citations
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
NeurIPS 2023
0
citations
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
NeurIPS 2023
0
citations