Ross Girshick

48

Papers

1,726

Total Citations

Papers (48)

Hypercolumns for Object Segmentation and Fine-Grained Localization

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Aligning 3D Models to RGB-D Images of Cluttered Scenes

Training Region-Based Object Detectors With Online Hard Example Mining

You Only Look Once: Unified, Real-Time Object Detection

Inside-Outside Net: Detecting Objects in Context With Skip Pooling and Recurrent Neural Networks

Seeing Through the Human Reporting Bias: Visual Classifiers From Noisy Human-Centric Labels

Aggregated Residual Transformations for Deep Neural Networks

Feature Pyramid Networks for Object Detection

Learning Features by Watching Objects Move

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Learning by Asking Questions

Data Distillation: Towards Omni-Supervised Learning

Learning to Segment Every Thing

Low-Shot Learning From Imaginary Data

Non-Local Neural Networks

Detecting and Recognizing Human-Object Interactions

Long-Term Feature Banks for Detailed Video Understanding

LVIS: A Dataset for Large Vocabulary Instance Segmentation

Panoptic Feature Pyramid Networks

Panoptic Segmentation

A Multigrid Method for Efficiently Training Video Models

PointRend: Image Segmentation As Rendering

Momentum Contrast for Unsupervised Visual Representation Learning

Designing Network Design Spaces

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Fast and Accurate Model Scaling

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Masked Autoencoders Are Scalable Vision Learners

Contextual Action Recognition With R*CNN

Fast R-CNN

Actions and Attributes From Wholes and Parts

Mask R-CNN

Inferring and Executing Programs for Visual Reasoning

Low-Shot Visual Recognition by Shrinking and Hallucinating Features

Exploring Randomly Wired Neural Networks for Image Recognition

TensorMask: A Foundation for Dense Object Segmentation

Rethinking ImageNet Pre-Training

The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining

Segment Anything

Are Labels Necessary for Neural Architecture Search?

Exploring Plain Vision Transformer Backbones for Object Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Focal Loss for Dense Object Detection

Deformable Part Models are Convolutional Neural Networks

PHYRE: A New Benchmark for Physical Reasoning

Unsupervised Deep Embedding for Clustering Analysis