Devi Parikh
57
Papers
1,940
Total Citations
Papers (57)
Hierarchical Question-Image Co-Attention for Visual Question Answering
NeurIPS 2016arXiv
1,702
citations
Emu Edit: Precise Image Editing via Recognition and Generation Tasks
CVPR 2024
238
citations
Image Specificity
CVPR 2015
0
citations
Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks
CVPR 2015
0
citations
CIDEr: Consensus-Based Image Description Evaluation
CVPR 2015
0
citations
We Are Humor Beings: Understanding and Predicting Visual Humor
CVPR 2016
0
citations
Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes
CVPR 2016
0
citations
Yin and Yang: Balancing and Answering Binary Visual Questions
CVPR 2016
0
citations
Joint Unsupervised Learning of Deep Representations and Image Clusters
CVPR 2016
0
citations
Context-Aware Captions From Context-Agnostic Supervision
CVPR 2017arXiv
0
citations
Visual Dialog
CVPR 2017arXiv
0
citations
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
CVPR 2017arXiv
0
citations
Counting Everyday Objects in Everyday Scenes
CVPR 2017arXiv
0
citations
Making the v in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
CVPR 2017
0
citations
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
CVPR 2018arXiv
0
citations
Neural Baby Talk
CVPR 2018arXiv
0
citations
Cycle-Consistency for Robust Visual Question Answering
CVPR 2019
0
citations
Embodied Question Answering in Photorealistic Environments With Point Cloud Perception
CVPR 2019
0
citations
Audio Visual Scene-Aware Dialog
CVPR 2019
0
citations
Towards VQA Models That Can Read
CVPR 2019
0
citations
SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions
CVPR 2020arXiv
0
citations
12-in-1: Multi-Task Vision and Language Representation Learning
CVPR 2020
0
citations
Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
CVPR 2021arXiv
0
citations
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
CVPR 2021arXiv
0
citations
Episodic Memory Question Answering
CVPR 2022arXiv
0
citations
SpaText: Spatio-Textual Representation for Controllable Image Generation
CVPR 2023arXiv
0
citations
VQA: Visual Question Answering
ICCV 2015
0
citations
Learning Common Sense Through Visual Abstraction
ICCV 2015
0
citations
Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization
ICCV 2017
0
citations
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
ICCV 2019
0
citations
Embodied Amodal Recognition: Learning to Move to Perceive Objects
ICCV 2019
0
citations
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
ICCV 2019
0
citations
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
ICCV 2019
0
citations
Fashion++: Minimal Edits for Outfit Improvement
ICCV 2019
0
citations
nocaps: novel object captioning at scale
ICCV 2019
0
citations
Habitat: A Platform for Embodied AI Research
ICCV 2019
0
citations
Contrast and Classify: Training Robust VQA Models
ICCV 2021arXiv
0
citations
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation
ICCV 2023
0
citations
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
ECCV 2020
0
citations
Spatially Aware Multimodal Transformers for TextVQA
ECCV 2020
0
citations
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
ECCV 2020
0
citations
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
ECCV 2020
0
citations
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
ECCV 2022
0
citations
Make-a-Scene: Scene-Based Text-to-Image Generation with Human Priors
ECCV 2022
0
citations
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
ECCV 2022
0
citations
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
NeurIPS 2017arXiv
0
citations
Embodied Question Answering
CVPR 2018arXiv
0
citations
Understanding Image Virality
CVPR 2015
0
citations
RUBi: Reducing Unimodal Biases for Visual Question Answering
NeurIPS 2019
0
citations
Chasing Ghosts: Instruction Following as Bayesian State Tracking
NeurIPS 2019
0
citations
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
NeurIPS 2019
0
citations
Cross-channel Communication Networks
NeurIPS 2019
0
citations
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
NeurIPS 2020
0
citations
Human-Adversarial Visual Question Answering
NeurIPS 2021
0
citations
TarMAC: Targeted Multi-Agent Communication
ICML 2019
0
citations
Counterfactual Visual Explanations
ICML 2019
0
citations
Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering
ICML 2019
0
citations