Devi Parikh

57
Papers
1,940
Total Citations

Papers (57)

Hierarchical Question-Image Co-Attention for Visual Question Answering

NeurIPS 2016arXiv
1,702
citations

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

CVPR 2024
238
citations

Image Specificity

CVPR 2015
0
citations

Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks

CVPR 2015
0
citations

CIDEr: Consensus-Based Image Description Evaluation

CVPR 2015
0
citations

We Are Humor Beings: Understanding and Predicting Visual Humor

CVPR 2016
0
citations

Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes

CVPR 2016
0
citations

Yin and Yang: Balancing and Answering Binary Visual Questions

CVPR 2016
0
citations

Joint Unsupervised Learning of Deep Representations and Image Clusters

CVPR 2016
0
citations

Context-Aware Captions From Context-Agnostic Supervision

CVPR 2017arXiv
0
citations

Visual Dialog

CVPR 2017arXiv
0
citations

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning

CVPR 2017arXiv
0
citations

Counting Everyday Objects in Everyday Scenes

CVPR 2017arXiv
0
citations

Making the v in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

CVPR 2017
0
citations

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

CVPR 2018arXiv
0
citations

Neural Baby Talk

CVPR 2018arXiv
0
citations

Cycle-Consistency for Robust Visual Question Answering

CVPR 2019
0
citations

Embodied Question Answering in Photorealistic Environments With Point Cloud Perception

CVPR 2019
0
citations

Audio Visual Scene-Aware Dialog

CVPR 2019
0
citations

Towards VQA Models That Can Read

CVPR 2019
0
citations

SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions

CVPR 2020arXiv
0
citations

12-in-1: Multi-Task Vision and Language Representation Learning

CVPR 2020
0
citations

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

CVPR 2021arXiv
0
citations

KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA

CVPR 2021arXiv
0
citations

Episodic Memory Question Answering

CVPR 2022arXiv
0
citations

SpaText: Spatio-Textual Representation for Controllable Image Generation

CVPR 2023arXiv
0
citations

VQA: Visual Question Answering

ICCV 2015
0
citations

Learning Common Sense Through Visual Abstraction

ICCV 2015
0
citations

Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization

ICCV 2017
0
citations

SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation

ICCV 2019
0
citations

Embodied Amodal Recognition: Learning to Move to Perceive Objects

ICCV 2019
0
citations

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

ICCV 2019
0
citations

Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment

ICCV 2019
0
citations

Fashion++: Minimal Edits for Outfit Improvement

ICCV 2019
0
citations

nocaps: novel object captioning at scale

ICCV 2019
0
citations

Habitat: A Platform for Embodied AI Research

ICCV 2019
0
citations

Contrast and Classify: Training Robust VQA Models

ICCV 2021arXiv
0
citations

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

ICCV 2023
0
citations

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

ECCV 2020
0
citations

Spatially Aware Multimodal Transformers for TextVQA

ECCV 2020
0
citations

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

ECCV 2020
0
citations

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

ECCV 2020
0
citations

MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

ECCV 2022
0
citations

Make-a-Scene: Scene-Based Text-to-Image Generation with Human Priors

ECCV 2022
0
citations

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

ECCV 2022
0
citations

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

NeurIPS 2017arXiv
0
citations

Embodied Question Answering

CVPR 2018arXiv
0
citations

Understanding Image Virality

CVPR 2015
0
citations

RUBi: Reducing Unimodal Biases for Visual Question Answering

NeurIPS 2019
0
citations

Chasing Ghosts: Instruction Following as Bayesian State Tracking

NeurIPS 2019
0
citations

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

NeurIPS 2019
0
citations

Cross-channel Communication Networks

NeurIPS 2019
0
citations

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

NeurIPS 2020
0
citations

Human-Adversarial Visual Question Answering

NeurIPS 2021
0
citations

TarMAC: Targeted Multi-Agent Communication

ICML 2019
0
citations

Counterfactual Visual Explanations

ICML 2019
0
citations

Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering

ICML 2019
0
citations