Trevor Darrell

128
Papers
2,492
Total Citations

Papers (128)

Toward Multimodal Image-to-Image Translation

NeurIPS 2017arXiv
1,423
citations

Sequential Modeling Enables Scalable Learning for Large Vision Models

CVPR 2024
230
citations

Compositional Chain-of-Thought Prompting for Large Multimodal Models

CVPR 2024
167
citations

Navigation World Models

CVPR 2025arXiv
136
citations

Self-correcting LLM-controlled Diffusion Models

CVPR 2024
95
citations

LLM-grounded Video Diffusion Models

ICLR 2024
76
citations

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

CVPR 2024
71
citations

When Do We Not Need Larger Vision Models?

ECCV 2024
70
citations

Describing Differences in Image Sets with Natural Language

CVPR 2024
51
citations

Describe Anything: Detailed Localized Image and Video Captioning

ICCV 2025
49
citations

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

CVPR 2024
36
citations

Pre-training Auto-regressive Robotic Models with 4D Representations

ICML 2025
19
citations

PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor

CVPR 2024
17
citations

VisionArena: 230k Real World User-VLM Conversations with Preference Labels

CVPR 2025
12
citations

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

NeurIPS 2025arXiv
10
citations

Recursive Visual Programming

ECCV 2024
10
citations

Vision-Language Models Create Cross-Modal Task Representations

ICML 2025
7
citations

Dual-Process Image Generation

ICCV 2025
6
citations

LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery

NeurIPS 2025
4
citations

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

ICCV 2025
3
citations

Compact Bilinear Pooling

CVPR 2016
0
citations

Learning With Side Information Through Modality Hallucination

CVPR 2016
0
citations

Context Encoders: Feature Learning by Inpainting

CVPR 2016
0
citations

Natural Language Object Retrieval

CVPR 2016
0
citations

Modeling Relationships in Referential Expressions With Compositional Modular Networks

CVPR 2017arXiv
0
citations

End-To-End Learning of Driving Models From Large-Scale Video Datasets

CVPR 2017arXiv
0
citations

Learning Features by Watching Objects Move

CVPR 2017arXiv
0
citations

Captioning Images With Diverse Objects

CVPR 2017arXiv
0
citations

Learning Detection With Diverse Proposals

CVPR 2017arXiv
0
citations

Adversarial Discriminative Domain Adaptation

CVPR 2017arXiv
0
citations

Deep Layer Aggregation

CVPR 2018arXiv
0
citations

Learning to Segment Every Thing

CVPR 2018arXiv
0
citations

Fooling Vision and Language Models Despite Localization and Attention Mechanism

CVPR 2018arXiv
0
citations

Multi-Content GAN for Few-Shot Font Style Transfer

CVPR 2018arXiv
0
citations

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

CVPR 2018arXiv
0
citations

TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning

CVPR 2019
0
citations

Hierarchical Discrete Distribution Decomposition for Match Density Estimation

CVPR 2019
0
citations

Adversarial Inference for Multi-Sentence Video Description

CVPR 2019
0
citations

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

CVPR 2019
0
citations

BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

CVPR 2020arXiv
0
citations

Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks

CVPR 2020
0
citations

Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules

CVPR 2020
0
citations

Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA

CVPR 2020arXiv
0
citations

Learning Saliency Propagation for Semi-Supervised Instance Segmentation

CVPR 2020
0
citations

Quasi-Dense Similarity Learning for Multiple Object Tracking

CVPR 2021arXiv
0
citations

Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation

CVPR 2021arXiv
0
citations

Body2Hands: Learning To Infer 3D Hands From Conversational Gesture Body Dynamics

CVPR 2021arXiv
0
citations

Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation

CVPR 2021arXiv
0
citations

SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning

CVPR 2021arXiv
0
citations

DETReg: Unsupervised Pretraining With Region Priors for Object Detection

CVPR 2022arXiv
0
citations

Contrastive Test-Time Adaptation

CVPR 2022arXiv
0
citations

A ConvNet for the 2020s

CVPR 2022arXiv
0
citations

Object-Region Video Transformers

CVPR 2022
0
citations

Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion

CVPR 2022arXiv
0
citations

On Guiding Visual Attention With Language Specification

CVPR 2022arXiv
0
citations

Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption

CVPR 2023
0
citations

Top-Down Visual Attention From Analysis by Synthesis

CVPR 2023arXiv
0
citations

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

ICCV 2015
0
citations

Spatial Semantic Regularisation for Large Scale Object Detection

ICCV 2015
0
citations

Learning The Structure of Deep Convolutional Networks

ICCV 2015
0
citations

Simultaneous Deep Transfer Across Domains and Tasks

ICCV 2015
0
citations

Sequence to Sequence - Video to Text

ICCV 2015
0
citations

Learning to Reason: End-To-End Module Networks for Visual Question Answering

ICCV 2017arXiv
0
citations

Generalized Orderless Pooling Performs Implicit Salient Matching

ICCV 2017arXiv
0
citations

Localizing Moments in Video With Natural Language

ICCV 2017arXiv
0
citations

Robust Change Captioning

ICCV 2019
0
citations

Joint Monocular 3D Vehicle Detection and Tracking

ICCV 2019
0
citations

Variational Adversarial Active Learning

ICCV 2019
0
citations

Semi-Supervised Domain Adaptation via Minimax Entropy

ICCV 2019
0
citations

Few-Shot Object Detection via Feature Reweighting

ICCV 2019
0
citations

Disentangling Propagation and Generation for Video Prediction

ICCV 2019
0
citations

Language-Conditioned Graph Networks for Relational Reasoning

ICCV 2019
0
citations

Predicting With Confidence on Unseen Distributions

ICCV 2021arXiv
0
citations

Temporal Action Detection With Multi-Level Supervision

ICCV 2021arXiv
0
citations

Robust Object Detection via Instance-Level Temporal Cycle Confusion

ICCV 2021arXiv
0
citations

Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning

ICCV 2021
0
citations

Region Similarity Representation Learning

ICCV 2021arXiv
0
citations

Tune It the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density

ICCV 2021arXiv
0
citations

Rethinking Preventing Class-Collapsing in Metric Learning With Margin-Based Losses

ICCV 2021arXiv
0
citations

Can Language Models Learn to Listen?

ICCV 2023arXiv
0
citations

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

ICCV 2023
0
citations

Hierarchical Style-based Networks for Motion Synthesis

ECCV 2020
0
citations

Adversarial Continual Learning

ECCV 2020
0
citations

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

ECCV 2020
0
citations

Identity-Aware Multi-Sentence Video Description

ECCV 2020
0
citations

Learning Canonical Representations for Scene Graph to Image Generation

ECCV 2020
0
citations

Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning

ECCV 2020
0
citations

Studying Bias in GANs through the Lens of Race

ECCV 2022
0
citations

Learning to Detect Every Thing in an Open World

ECCV 2022
0
citations

TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency

ECCV 2022
0
citations

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

ECCV 2022
0
citations

Stochastic positional embeddings improve masked image modeling

ICML 2024
0
citations

Scaling Vision Pre-Training to 4K Resolution

CVPR 2025
0
citations

Visual Lexicon: Rich Image Features in Language Space

CVPR 2025
0
citations

Pose Priors from Language Models

CVPR 2025
0
citations

AutoPresent: Designing Structured Visuals from Scratch

CVPR 2025
0
citations

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

ICCV 2025
0
citations

Discovering Divergent Representations between Text-to-Image Models

ICCV 2025
0
citations

InstanceDiffusion: Instance-level Control for Image Generation

CVPR 2024
0
citations

See Say and Segment: Teaching LMMs to Overcome False Premises

CVPR 2024
0
citations

Unsupervised Universal Image Segmentation

CVPR 2024
0
citations

Readout Guidance: Learning Control from Diffusion Features

CVPR 2024
0
citations

Hyperbolic Active Learning for Semantic Segmentation under Domain Shift

ICML 2024
0
citations

xT: Nested Tokenization for Larger Context in Large Images

ICML 2024
0
citations

Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI

ICML 2024
0
citations

Deformable Part Models are Convolutional Neural Networks

CVPR 2015
0
citations

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

CVPR 2015
0
citations

Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning

CVPR 2015
0
citations

Fully Convolutional Networks for Semantic Segmentation

CVPR 2015
0
citations

Deep Compositional Captioning: Describing Novel Object Categories Without Paired Training Data

CVPR 2016
0
citations

Neural Module Networks

CVPR 2016
0
citations

Speaker-Follower Models for Vision-and-Language Navigation

NeurIPS 2018
0
citations

Compositional Plan Vectors

NeurIPS 2019
0
citations

Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity

NeurIPS 2019
0
citations

Fighting Copycat Agents in Behavioral Cloning from Observation Histories

NeurIPS 2020
0
citations

Auxiliary Task Reweighting for Minimum-data Learning

NeurIPS 2020
0
citations

Teachable Reinforcement Learning via Advice Distillation

NeurIPS 2021
0
citations

CLIP-It! Language-Guided Video Summarization

NeurIPS 2021
0
citations

Early Convolutions Help Transformers See Better

NeurIPS 2021
0
citations

K-LITE: Learning Transferable Visual Models with External Knowledge

NeurIPS 2022
0
citations

Visual Prompting via Image Inpainting

NeurIPS 2022
0
citations

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

NeurIPS 2022
0
citations

Hierarchical Open-vocabulary Universal Image Segmentation

NeurIPS 2023
0
citations

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

NeurIPS 2023
0
citations

Large Language Models are Visual Reasoning Coordinators

NeurIPS 2023
0
citations

Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation

NeurIPS 2023
0
citations

Curiosity-driven Exploration by Self-supervised Prediction

ICML 2017
0
citations

CyCADA: Cycle-Consistent Adversarial Domain Adaptation

ICML 2018
0
citations