Jianfeng Gao

54
Papers
1,831
Total Citations

Papers (54)

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

ICLR 2024
1,171
citations

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

ICLR 2024
372
citations

Is Self-Repair a Silver Bullet for Code Generation?

ICLR 2024
160
citations

Visual In-Context Prompting

CVPR 2024
52
citations

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

ICLR 2025
23
citations

DataGen: Unified Synthetic Dataset Generation via Large Language Models

ICLR 2025
21
citations

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

CVPR 2025
15
citations

Vector-ICL: In-context Learning with Continuous Vector Representations

ICLR 2025
10
citations

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

ICLR 2025
7
citations

Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation

CVPR 2019
0
citations

Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation

CVPR 2019
0
citations

Object-Driven Text-To-Image Synthesis via Adversarial Training

CVPR 2019
0
citations

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training

CVPR 2020arXiv
0
citations

VinVL: Revisiting Visual Representations in Vision-Language Models

CVPR 2021arXiv
0
citations

Grounded Language-Image Pre-Training

CVPR 2022arXiv
0
citations

RegionCLIP: Region-Based Language-Image Pretraining

CVPR 2022arXiv
0
citations

WebQA: Multihop and Multimodal QA

CVPR 2022arXiv
0
citations

Unified Contrastive Learning in Image-Text-Label Space

CVPR 2022arXiv
0
citations

Learning Customized Visual Models With Retrieval-Augmented Knowledge

CVPR 2023arXiv
0
citations

GLIGEN: Open-Set Grounded Text-to-Image Generation

CVPR 2023arXiv
0
citations

Generalized Decoding for Pixel, Image, and Language

CVPR 2023arXiv
0
citations

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

ICCV 2021arXiv
0
citations

TACo: Token-Aware Cascade Contrastive Learning for Video-Text Alignment

ICCV 2021arXiv
0
citations

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

ECCV 2020
0
citations

End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture

NeurIPS 2015
0
citations

From Captions to Visual Concepts and Back

CVPR 2015
0
citations

SITE: towards Spatial Intelligence Thorough Evaluation

ICCV 2025
0
citations

Position: TrustLLM: Trustworthiness in Large Language Models

ICML 2024
0
citations

Magma: A Foundation Model for Multimodal AI Agents

CVPR 2025
0
citations

Stacked Attention Networks for Image Question Answering

CVPR 2016
0
citations

StyleNet: Generating Attractive Visual Captions With Styles

CVPR 2017
0
citations

Semantic Compositional Networks for Visual Captioning

CVPR 2017arXiv
0
citations

Language-Based Image Editing With Recurrent Attentive Models

CVPR 2018arXiv
0
citations

StoryGAN: A Sequential Conditional GAN for Story Visualization

CVPR 2019
0
citations

Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization

NeurIPS 2018
0
citations

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

NeurIPS 2018
0
citations

M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

NeurIPS 2018
0
citations

Unified Language Model Pre-training for Natural Language Understanding and Generation

NeurIPS 2019
0
citations

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

NeurIPS 2021
0
citations

Focal Attention for Long-Range Interactions in Vision Transformers

NeurIPS 2021
0
citations

Focal Modulation Networks

NeurIPS 2022
0
citations

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

NeurIPS 2022
0
citations

Fault-Aware Neural Code Rankers

NeurIPS 2022
0
citations

K-LITE: Learning Transferable Visual Models with External Knowledge

NeurIPS 2022
0
citations

Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

NeurIPS 2022
0
citations

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

NeurIPS 2022
0
citations

GLIPv2: Unifying Localization and Vision-Language Understanding

NeurIPS 2022
0
citations

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

NeurIPS 2023
0
citations

Bridging Discrete and Backpropagation: Straight-Through and Beyond

NeurIPS 2023
0
citations

Segment Everything Everywhere All at Once

NeurIPS 2023
0
citations

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

NeurIPS 2023
0
citations

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

NeurIPS 2023
0
citations

Guiding Large Language Models via Directional Stimulus Prompting

NeurIPS 2023
0
citations

Augmenting Language Models with Long-Term Memory

NeurIPS 2023
0
citations