Ranjay Krishna
23
Papers
358
Total Citations
Papers (23)
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
96
citations
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
ICLR 2025
80
citations
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
CVPR 2024
52
citations
One Diffusion to Generate Them All
CVPR 2025
34
citations
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
ECCV 2024
25
citations
Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
ECCV 2024
23
citations
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
CVPR 2024
20
citations
Iterated Learning Improves Compositionality in Large Vision-Language Models
CVPR 2024
16
citations
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
CVPR 2025arXiv
9
citations
Convergent Functions, Divergent Forms
NeurIPS 2025arXiv
3
citations
Holodeck: Language Guided Generation of 3D Embodied AI Environments
CVPR 2024
0
citations
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
CVPR 2024
0
citations
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
CVPR 2024
0
citations
Offline Training of Language Model Agents with Functions as Learnable Weights
ICML 2024
0
citations
RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
CVPR 2025
0
citations
Semantic and Expressive Variations in Image Captions Across Languages
CVPR 2025
0
citations
NVILA: Efficient Frontier Visual Language Models
CVPR 2025
0
citations
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
CVPR 2025
0
citations
Synthetic Visual Genome
CVPR 2025
0
citations
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
CVPR 2025
0
citations
PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology
ICCV 2025
0
citations
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
ICCV 2025
0
citations
Contrastive Flow Matching
ICCV 2025
0
citations