Ishan Misra
41
Papers
63
Total Citations
Papers (41)
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
CVPR 2024
36
citations
Generating Multi-Image Synthetic Data for Text-to-Image Customization
ICCV 2025
14
citations
Generating Illustrated Instructions
CVPR 2024
7
citations
LLMs can see and hear without any training
ICML 2025
6
citations
Seeing Through the Human Reporting Bias: Visual Classifiers From Noisy Human-Centric Labels
CVPR 2016
0
citations
Cross-Stitch Networks for Multi-Task Learning
CVPR 2016
0
citations
From Red Wine to Red Tomato: Composition With Context
CVPR 2017
0
citations
Learning by Asking Questions
CVPR 2018arXiv
0
citations
ClusterFit: Improving Generalization of Visual Representations
CVPR 2020arXiv
0
citations
Self-Supervised Learning of Pretext-Invariant Representations
CVPR 2020arXiv
0
citations
In Defense of Grid Features for Visual Question Answering
CVPR 2020arXiv
0
citations
Audio-Visual Instance Discrimination with Cross-Modal Agreement
CVPR 2021arXiv
0
citations
Robust Audio-Visual Instance Discrimination
CVPR 2021arXiv
0
citations
3D Spatial Recognition Without Spatially Labeled 3D
CVPR 2021arXiv
0
citations
Omnivore: A Single Model for Many Visual Modalities
CVPR 2022arXiv
0
citations
Masked-Attention Mask Transformer for Universal Image Segmentation
CVPR 2022arXiv
0
citations
GeneCIS: A Benchmark for General Conditional Image Similarity
CVPR 2023
0
citations
OmniMAE: Single Model Masked Pretraining on Images and Videos
CVPR 2023arXiv
0
citations
ImageBind: One Embedding Space To Bind Them All
CVPR 2023arXiv
0
citations
Cut and Learn for Unsupervised Object Detection and Instance Segmentation
CVPR 2023arXiv
0
citations
Learning Video Representations From Large Language Models
CVPR 2023
0
citations
Self-Supervised Learning From Images With a Joint-Embedding Predictive Architecture
CVPR 2023arXiv
0
citations
Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection
ICCV 2017arXiv
0
citations
3D-RelNet: Joint Object and Relational Network for 3D Prediction
ICCV 2019
0
citations
Scaling and Benchmarking Self-Supervised Visual Representation Learning
ICCV 2019
0
citations
An End-to-End Transformer Model for 3D Object Detection
ICCV 2021arXiv
0
citations
Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments With Support Samples
ICCV 2021arXiv
0
citations
Self-Supervised Pretraining of 3D Features on Any Point-Cloud
ICCV 2021arXiv
0
citations
Emerging Properties in Self-Supervised Vision Transformers
ICCV 2021arXiv
0
citations
Space-Time Crop & Attend: Improving Cross-Modal Video Representation Learning
ICCV 2021arXiv
0
citations
The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining
ICCV 2023arXiv
0
citations
MOST: Multiple Object Localization with Self-Supervised Transformers for Object Discovery
ICCV 2023arXiv
0
citations
Detecting Twenty-Thousand Classes Using Image-Level Supervision
ECCV 2022
0
citations
Masked Siamese Networks for Label-Efficient Learning
ECCV 2022
0
citations
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding
ICCV 2021
0
citations
InstanceDiffusion: Instance-level Control for Image Generation
CVPR 2024
0
citations
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
CVPR 2024
0
citations
Watch and Learn: Semi-Supervised Learning for Object Detectors From Video
CVPR 2015
0
citations
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
NeurIPS 2020
0
citations
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
NeurIPS 2021
0
citations
A Data-Augmentation Is Worth A Thousand Samples: Analytical Moments And Sampling-Free Training
NeurIPS 2022
0
citations