Jiasen Lu
16
Papers
1,852
Total Citations
Papers (16)
Hierarchical Question-Image Co-Attention for Visual Question Answering
NeurIPS 2016arXiv
1,702
citations
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
96
citations
One Diffusion to Generate Them All
CVPR 2025
34
citations
STIV: Scalable Text and Image Conditioned Video Generation
ICCV 2025
20
citations
Neural Baby Talk
CVPR 2018arXiv
0
citations
12-in-1: Multi-Task Vision and Language Representation Learning
CVPR 2020
0
citations
VQA: Visual Question Answering
ICCV 2015
0
citations
Spatially Aware Multimodal Transformers for TextVQA
ECCV 2020
0
citations
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
NeurIPS 2017arXiv
0
citations
MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound
CVPR 2022arXiv
0
citations
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
CVPR 2024
0
citations
Human Action Segmentation With Hierarchical Supervoxel Consistency
CVPR 2015
0
citations
Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning
CVPR 2017arXiv
0
citations
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
NeurIPS 2019
0
citations
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
NeurIPS 2020
0
citations
Container: Context Aggregation Networks
NeurIPS 2021
0
citations