Serena Yeung

20

Papers

72

Total Citations

Papers (20)

Describing Differences in Image Sets with Natural Language

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Apollo: An Exploration of Video Understanding in Large Multimodal Models

End-To-End Learning of Action Detection From Frame Glimpses in Videos

Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals

Learning to Learn From Noisy Web Videos

Holistic 3D Human and Scene Mesh Estimation From Single View Images

Unsupervised Discovery of the Long-Tail in Instance Segmentation Using Hierarchical Self-Supervision

PROB: Probabilistic Objectness for Open World Object Detection

NeMo: Learning 3D Neural Motion Fields From Multiple Video Instances of the Same Action

GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition

Generalizable Neural Fields as Partially Observed Neural Processes

DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

Capturing implicit hierarchical structure in 3D biomedical images with self-supervised hyperbolic representations

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

DataPerf: Benchmarks for Data-Centric AI Development

INSPECT: A Multimodal Dataset for Patient Outcome Prediction of Pulmonary Embolisms

LOVM: Language-Only Vision Model Selection