Jae Sung Park

10

Papers

96

Total Citations

Papers (10)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Synthetic Visual Genome

Adversarial Inference for Multi-Sentence Video Description

Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

Identity-Aware Multi-Sentence Video Description

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

MERLOT: Multimodal Neural Script Knowledge Models

LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes

Localized Symbolic Knowledge Distillation for Visual Commonsense Models