Jianfeng Gao
54
Papers
1,831
Total Citations
Papers (54)
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
ICLR 2024
1,171
citations
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
ICLR 2024
372
citations
Is Self-Repair a Silver Bullet for Code Generation?
ICLR 2024
160
citations
Visual In-Context Prompting
CVPR 2024
52
citations
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
ICLR 2025
23
citations
DataGen: Unified Synthetic Dataset Generation via Large Language Models
ICLR 2025
21
citations
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
CVPR 2025
15
citations
Vector-ICL: In-context Learning with Continuous Vector Representations
ICLR 2025
10
citations
Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass
ICLR 2025
7
citations
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
CVPR 2019
0
citations
Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation
CVPR 2019
0
citations
Object-Driven Text-To-Image Synthesis via Adversarial Training
CVPR 2019
0
citations
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training
CVPR 2020arXiv
0
citations
VinVL: Revisiting Visual Representations in Vision-Language Models
CVPR 2021arXiv
0
citations
Grounded Language-Image Pre-Training
CVPR 2022arXiv
0
citations
RegionCLIP: Region-Based Language-Image Pretraining
CVPR 2022arXiv
0
citations
WebQA: Multihop and Multimodal QA
CVPR 2022arXiv
0
citations
Unified Contrastive Learning in Image-Text-Label Space
CVPR 2022arXiv
0
citations
Learning Customized Visual Models With Retrieval-Augmented Knowledge
CVPR 2023arXiv
0
citations
GLIGEN: Open-Set Grounded Text-to-Image Generation
CVPR 2023arXiv
0
citations
Generalized Decoding for Pixel, Image, and Language
CVPR 2023arXiv
0
citations
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
ICCV 2021arXiv
0
citations
TACo: Token-Aware Cascade Contrastive Learning for Video-Text Alignment
ICCV 2021arXiv
0
citations
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
ECCV 2020
0
citations
End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture
NeurIPS 2015
0
citations
From Captions to Visual Concepts and Back
CVPR 2015
0
citations
SITE: towards Spatial Intelligence Thorough Evaluation
ICCV 2025
0
citations
Position: TrustLLM: Trustworthiness in Large Language Models
ICML 2024
0
citations
Magma: A Foundation Model for Multimodal AI Agents
CVPR 2025
0
citations
Stacked Attention Networks for Image Question Answering
CVPR 2016
0
citations
StyleNet: Generating Attractive Visual Captions With Styles
CVPR 2017
0
citations
Semantic Compositional Networks for Visual Captioning
CVPR 2017arXiv
0
citations
Language-Based Image Editing With Recurrent Attentive Models
CVPR 2018arXiv
0
citations
StoryGAN: A Sequential Conditional GAN for Story Visualization
CVPR 2019
0
citations
Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization
NeurIPS 2018
0
citations
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models
NeurIPS 2018
0
citations
M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search
NeurIPS 2018
0
citations
Unified Language Model Pre-training for Natural Language Understanding and Generation
NeurIPS 2019
0
citations
Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
NeurIPS 2021
0
citations
Focal Attention for Long-Range Interactions in Vision Transformers
NeurIPS 2021
0
citations
Focal Modulation Networks
NeurIPS 2022
0
citations
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
NeurIPS 2022
0
citations
Fault-Aware Neural Code Rankers
NeurIPS 2022
0
citations
K-LITE: Learning Transferable Visual Models with External Knowledge
NeurIPS 2022
0
citations
Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
NeurIPS 2022
0
citations
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
NeurIPS 2022
0
citations
GLIPv2: Unifying Localization and Vision-Language Understanding
NeurIPS 2022
0
citations
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
NeurIPS 2023
0
citations
Bridging Discrete and Backpropagation: Straight-Through and Beyond
NeurIPS 2023
0
citations
Segment Everything Everywhere All at Once
NeurIPS 2023
0
citations
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
NeurIPS 2023
0
citations
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
NeurIPS 2023
0
citations
Guiding Large Language Models via Directional Stimulus Prompting
NeurIPS 2023
0
citations
Augmenting Language Models with Long-Term Memory
NeurIPS 2023
0
citations