Aniruddha Kembhavi

10

Papers

198

Total Citations

Papers (10)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

One Diffusion to Generate Them All

Iterated Learning Improves Compositionality in Large Vision-Language Models

Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

Seeing the Unseen: Visual Common Sense for Semantic Placement

ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams

Holodeck: Language Guided Generation of 3D Embodied AI Environments