Ranjay Krishna

43
Papers
358
Total Citations

Papers (43)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CVPR 2025
96
citations

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

ICLR 2025
80
citations

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

CVPR 2024
52
citations

One Diffusion to Generate Them All

CVPR 2025
34
citations

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

ECCV 2024
25
citations

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

ECCV 2024
23
citations

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

CVPR 2024
20
citations

Iterated Learning Improves Compositionality in Large Vision-Language Models

CVPR 2024
16
citations

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

CVPR 2025arXiv
9
citations

Convergent Functions, Divergent Forms

NeurIPS 2025arXiv
3
citations

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

CVPR 2024
0
citations

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

CVPR 2024
0
citations

Offline Training of Language Model Agents with Functions as Learnable Weights

ICML 2024
0
citations

Image Retrieval Using Scene Graphs

CVPR 2015
0
citations

A Hierarchical Approach for Generating Descriptive Image Paragraphs

CVPR 2017arXiv
0
citations

Referring Relationships

CVPR 2018arXiv
0
citations

Information Maximizing Visual Question Generation

CVPR 2019
0
citations

Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs

CVPR 2020
0
citations

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

CVPR 2021
0
citations

Measuring Compositional Consistency for Video Question Answering

CVPR 2022arXiv
0
citations

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

CVPR 2023arXiv
0
citations

Dense-Captioning Events in Videos

ICCV 2017arXiv
0
citations

Scene Graph Prediction With Limited Labels

ICCV 2019
0
citations

Agile Modeling: From Concept to Classifier in Minutes

ICCV 2023arXiv
0
citations

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

ICCV 2023arXiv
0
citations

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

ICCV 2025
0
citations

RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations

CVPR 2025
0
citations

Semantic and Expressive Variations in Image Captions Across Languages

CVPR 2025
0
citations

NVILA: Efficient Frontier Visual Language Models

CVPR 2025
0
citations

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

CVPR 2025
0
citations

Synthetic Visual Genome

CVPR 2025
0
citations

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

CVPR 2025
0
citations

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

ICCV 2025
0
citations

Contrastive Flow Matching

ICCV 2025
0
citations

Holodeck: Language Guided Generation of 3D Embodied AI Environments

CVPR 2024
0
citations

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

NeurIPS 2019
0
citations

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

NeurIPS 2022
0
citations

OBJECT 3DIT: Language-guided 3D-aware Image Editing

NeurIPS 2023
0
citations

DataComp: In search of the next generation of multimodal datasets

NeurIPS 2023
0
citations

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

NeurIPS 2023
0
citations

Quilt-1M: One Million Image-Text Pairs for Histopathology

NeurIPS 2023
0
citations

Cola: A Benchmark for Compositional Text-to-image Retrieval

NeurIPS 2023
0
citations

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

NeurIPS 2023
0
citations