Mark Yatskar

12

Papers

146

Total Citations

Papers (12)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

ViUniT: Visual Unit Tests for More Robust Visual Programming

Commonly Uncommon: Semantic Sparsity in Situation Recognition

Neural Motifs: Scene Graph Parsing With Global Context

RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

Visual Semantic Role Labeling for Video Understanding

Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification

Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations

Grounded Situation Recognition

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Situation Recognition: Visual Semantic Role Labeling for Image Understanding