Ranjay Krishna

23
Papers
358
Total Citations

Papers (23)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CVPR 2025
96
citations

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

ICLR 2025
80
citations

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

CVPR 2024
52
citations

One Diffusion to Generate Them All

CVPR 2025
34
citations

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

ECCV 2024
25
citations

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

ECCV 2024
23
citations

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

CVPR 2024
20
citations

Iterated Learning Improves Compositionality in Large Vision-Language Models

CVPR 2024
16
citations

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model

CVPR 2025arXiv
9
citations

Convergent Functions, Divergent Forms

NeurIPS 2025arXiv
3
citations

Holodeck: Language Guided Generation of 3D Embodied AI Environments

CVPR 2024
0
citations

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

CVPR 2024
0
citations

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

CVPR 2024
0
citations

Offline Training of Language Model Agents with Functions as Learnable Weights

ICML 2024
0
citations

RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations

CVPR 2025
0
citations

Semantic and Expressive Variations in Image Captions Across Languages

CVPR 2025
0
citations

NVILA: Efficient Frontier Visual Language Models

CVPR 2025
0
citations

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

CVPR 2025
0
citations

Synthetic Visual Genome

CVPR 2025
0
citations

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

CVPR 2025
0
citations

PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology

ICCV 2025
0
citations

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

ICCV 2025
0
citations

Contrastive Flow Matching

ICCV 2025
0
citations