Yin Cui

23

Papers

88

Total Citations

Papers (23)

Describe Anything: Detailed Localized Image and Video Captioning

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary

Kernel Pooling for Convolutional Neural Networks

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning

Learning to Evaluate Image Captioning

The INaturalist Species Classification and Detection Dataset

Class-Balanced Loss Based on Effective Number of Samples

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Spatiotemporal Contrastive Video Representation Learning

Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision

Train-Once-for-All Personalization

Unified Visual Relationship Detection with Vision and Language Models

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset

Scaling Open-Vocabulary Image Segmentation with Image-Level Labels

Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segmentation

Learning Deep Representations for Ground-to-Aerial Geolocalization

Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning With Humans in the Loop

Rethinking Pre-training and Self-training

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Module-wise Adaptive Distillation for Multimodality Foundation Models

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception