Ali Farhadi

68

Papers

119

Total Citations

Papers (68)

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

DRAWER: Digital Reconstruction and Articulation With Environment Realism

Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos

Convergent Functions, Divergent Forms

NeurIPS 2025arXiv

Discriminative and Consistent Similarities in Instance-Level Multiple Instance Learning

VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases

You Only Look Once: Unified, Real-Time Object Detection

A Task-Oriented Approach for Cost-Sensitive Recognition

Actions ~ Transformations

Newtonian Scene Understanding: Unfolding the Dynamics of Objects in Static Images

Situation Recognition: Visual Semantic Role Labeling for Image Understanding

Asynchronous Temporal Fields for Action Recognition

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Commonly Uncommon: Semantic Sparsity in Situation Recognition

YOLO9000: Better, Faster, Stronger

Structured Set Matching Networks for One-Shot Part Labeling

Who Let the Dogs Out? Modeling Dog Behavior From Visual Data

IQA: Visual Question Answering in Interactive Environments

SeGAN: Segmenting and Generating the Invisible

Actor and Observer: Joint Modeling of First and Third-Person Videos

ELASTIC: Improving CNNs With Dynamic Scaling Policies

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

Two Body Problem: Collaborative Visual Task Completion

From Recognition to Cognition: Visual Commonsense Reasoning

Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning

Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

Butterfly Transform: An Efficient FFT Based Neural Architecture Design

What's Hidden in a Randomly Weighted Neural Network?

Visual Reaction: Learning to Play Catch With Your Drone

Pushing It Out of the Way: Interactive Visual Navigation

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound

Robust Fine-Tuning of Zero-Shot Models

Objaverse: A Universe of Annotated 3D Objects

Phone2Proc: Bringing Robust Robots Into Our Chaotic World

Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing

Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off!

Visual Semantic Planning Using Deep Successor Representations

See the Glass Half Full: Reasoning About Liquid Containers, Their Volume and Content

What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification

Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

Grounded Situation Recognition

A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

Break and Make: Interactive Structural Understanding Using LEGO Bricks

Object Manipulation via Visual Target Localization

Visalogy: Answering Visual Analogy Questions

LCNN: Lookup-Based Convolutional Neural Network

Synthetic Visual Genome

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Contrastive Flow Matching

Defending Against Neural Fake News

Discovering Neural Wirings

Supermasks in Superposition

MERLOT: Multimodal Neural Script Knowledge Models

LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes

Patching open-vocabulary models by interpolating weights

Matryoshka Representation Learning

Stable and low-precision training for large-scale vision-language models

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

DataComp: In search of the next generation of multimodal datasets

Objaverse-XL: A Universe of 10M+ 3D Objects

Neural Priming for Sample-Efficient Adaptation

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

AdANNS: A Framework for Adaptive Semantic Search

Unsupervised Deep Embedding for Clustering Analysis