Trevor Darrell
128
Papers
2,492
Total Citations
Papers (128)
Toward Multimodal Image-to-Image Translation
NeurIPS 2017arXiv
1,423
citations
Sequential Modeling Enables Scalable Learning for Large Vision Models
CVPR 2024
230
citations
Compositional Chain-of-Thought Prompting for Large Multimodal Models
CVPR 2024
167
citations
Navigation World Models
CVPR 2025arXiv
136
citations
Self-correcting LLM-controlled Diffusion Models
CVPR 2024
95
citations
LLM-grounded Video Diffusion Models
ICLR 2024
76
citations
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
CVPR 2024
71
citations
When Do We Not Need Larger Vision Models?
ECCV 2024
70
citations
Describing Differences in Image Sets with Natural Language
CVPR 2024
51
citations
Describe Anything: Detailed Localized Image and Video Captioning
ICCV 2025
49
citations
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
CVPR 2024
36
citations
Pre-training Auto-regressive Robotic Models with 4D Representations
ICML 2025
19
citations
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
CVPR 2024
17
citations
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
CVPR 2025
12
citations
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
NeurIPS 2025arXiv
10
citations
Recursive Visual Programming
ECCV 2024
10
citations
Vision-Language Models Create Cross-Modal Task Representations
ICML 2025
7
citations
Dual-Process Image Generation
ICCV 2025
6
citations
LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery
NeurIPS 2025
4
citations
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
ICCV 2025
3
citations
Compact Bilinear Pooling
CVPR 2016
0
citations
Learning With Side Information Through Modality Hallucination
CVPR 2016
0
citations
Context Encoders: Feature Learning by Inpainting
CVPR 2016
0
citations
Natural Language Object Retrieval
CVPR 2016
0
citations
Modeling Relationships in Referential Expressions With Compositional Modular Networks
CVPR 2017arXiv
0
citations
End-To-End Learning of Driving Models From Large-Scale Video Datasets
CVPR 2017arXiv
0
citations
Learning Features by Watching Objects Move
CVPR 2017arXiv
0
citations
Captioning Images With Diverse Objects
CVPR 2017arXiv
0
citations
Learning Detection With Diverse Proposals
CVPR 2017arXiv
0
citations
Adversarial Discriminative Domain Adaptation
CVPR 2017arXiv
0
citations
Deep Layer Aggregation
CVPR 2018arXiv
0
citations
Learning to Segment Every Thing
CVPR 2018arXiv
0
citations
Fooling Vision and Language Models Despite Localization and Attention Mechanism
CVPR 2018arXiv
0
citations
Multi-Content GAN for Few-Shot Font Style Transfer
CVPR 2018arXiv
0
citations
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
CVPR 2018arXiv
0
citations
TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning
CVPR 2019
0
citations
Hierarchical Discrete Distribution Decomposition for Match Density Estimation
CVPR 2019
0
citations
Adversarial Inference for Multi-Sentence Video Description
CVPR 2019
0
citations
Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders
CVPR 2019
0
citations
BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning
CVPR 2020arXiv
0
citations
Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks
CVPR 2020
0
citations
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules
CVPR 2020
0
citations
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA
CVPR 2020arXiv
0
citations
Learning Saliency Propagation for Semi-Supervised Instance Segmentation
CVPR 2020
0
citations
Quasi-Dense Similarity Learning for Multiple Object Tracking
CVPR 2021arXiv
0
citations
Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation
CVPR 2021arXiv
0
citations
Body2Hands: Learning To Infer 3D Hands From Conversational Gesture Body Dynamics
CVPR 2021arXiv
0
citations
Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation
CVPR 2021arXiv
0
citations
SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning
CVPR 2021arXiv
0
citations
DETReg: Unsupervised Pretraining With Region Priors for Object Detection
CVPR 2022arXiv
0
citations
Contrastive Test-Time Adaptation
CVPR 2022arXiv
0
citations
A ConvNet for the 2020s
CVPR 2022arXiv
0
citations
Object-Region Video Transformers
CVPR 2022
0
citations
Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion
CVPR 2022arXiv
0
citations
On Guiding Visual Attention With Language Specification
CVPR 2022arXiv
0
citations
Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption
CVPR 2023
0
citations
Top-Down Visual Attention From Analysis by Synthesis
CVPR 2023arXiv
0
citations
Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
ICCV 2015
0
citations
Spatial Semantic Regularisation for Large Scale Object Detection
ICCV 2015
0
citations
Learning The Structure of Deep Convolutional Networks
ICCV 2015
0
citations
Simultaneous Deep Transfer Across Domains and Tasks
ICCV 2015
0
citations
Sequence to Sequence - Video to Text
ICCV 2015
0
citations
Learning to Reason: End-To-End Module Networks for Visual Question Answering
ICCV 2017arXiv
0
citations
Generalized Orderless Pooling Performs Implicit Salient Matching
ICCV 2017arXiv
0
citations
Localizing Moments in Video With Natural Language
ICCV 2017arXiv
0
citations
Robust Change Captioning
ICCV 2019
0
citations
Joint Monocular 3D Vehicle Detection and Tracking
ICCV 2019
0
citations
Variational Adversarial Active Learning
ICCV 2019
0
citations
Semi-Supervised Domain Adaptation via Minimax Entropy
ICCV 2019
0
citations
Few-Shot Object Detection via Feature Reweighting
ICCV 2019
0
citations
Disentangling Propagation and Generation for Video Prediction
ICCV 2019
0
citations
Language-Conditioned Graph Networks for Relational Reasoning
ICCV 2019
0
citations
Predicting With Confidence on Unseen Distributions
ICCV 2021arXiv
0
citations
Temporal Action Detection With Multi-Level Supervision
ICCV 2021arXiv
0
citations
Robust Object Detection via Instance-Level Temporal Cycle Confusion
ICCV 2021arXiv
0
citations
Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning
ICCV 2021
0
citations
Region Similarity Representation Learning
ICCV 2021arXiv
0
citations
Tune It the Right Way: Unsupervised Validation of Domain Adaptation via Soft Neighborhood Density
ICCV 2021arXiv
0
citations
Rethinking Preventing Class-Collapsing in Metric Learning With Margin-Based Losses
ICCV 2021arXiv
0
citations
Can Language Models Learn to Listen?
ICCV 2023arXiv
0
citations
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
ICCV 2023
0
citations
Hierarchical Style-based Networks for Motion Synthesis
ECCV 2020
0
citations
Adversarial Continual Learning
ECCV 2020
0
citations
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
ECCV 2020
0
citations
Identity-Aware Multi-Sentence Video Description
ECCV 2020
0
citations
Learning Canonical Representations for Scene Graph to Image Generation
ECCV 2020
0
citations
Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning
ECCV 2020
0
citations
Studying Bias in GANs through the Lens of Race
ECCV 2022
0
citations
Learning to Detect Every Thing in an Open World
ECCV 2022
0
citations
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency
ECCV 2022
0
citations
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly
ECCV 2022
0
citations
Stochastic positional embeddings improve masked image modeling
ICML 2024
0
citations
Scaling Vision Pre-Training to 4K Resolution
CVPR 2025
0
citations
Visual Lexicon: Rich Image Features in Language Space
CVPR 2025
0
citations
Pose Priors from Language Models
CVPR 2025
0
citations
AutoPresent: Designing Structured Visuals from Scratch
CVPR 2025
0
citations
St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World
ICCV 2025
0
citations
Discovering Divergent Representations between Text-to-Image Models
ICCV 2025
0
citations
InstanceDiffusion: Instance-level Control for Image Generation
CVPR 2024
0
citations
See Say and Segment: Teaching LMMs to Overcome False Premises
CVPR 2024
0
citations
Unsupervised Universal Image Segmentation
CVPR 2024
0
citations
Readout Guidance: Learning Control from Diffusion Features
CVPR 2024
0
citations
Hyperbolic Active Learning for Semantic Segmentation under Domain Shift
ICML 2024
0
citations
xT: Nested Tokenization for Larger Context in Large Images
ICML 2024
0
citations
Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI
ICML 2024
0
citations
Deformable Part Models are Convolutional Neural Networks
CVPR 2015
0
citations
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
CVPR 2015
0
citations
Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning
CVPR 2015
0
citations
Fully Convolutional Networks for Semantic Segmentation
CVPR 2015
0
citations
Deep Compositional Captioning: Describing Novel Object Categories Without Paired Training Data
CVPR 2016
0
citations
Neural Module Networks
CVPR 2016
0
citations
Speaker-Follower Models for Vision-and-Language Navigation
NeurIPS 2018
0
citations
Compositional Plan Vectors
NeurIPS 2019
0
citations
Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity
NeurIPS 2019
0
citations
Fighting Copycat Agents in Behavioral Cloning from Observation Histories
NeurIPS 2020
0
citations
Auxiliary Task Reweighting for Minimum-data Learning
NeurIPS 2020
0
citations
Teachable Reinforcement Learning via Advice Distillation
NeurIPS 2021
0
citations
CLIP-It! Language-Guided Video Summarization
NeurIPS 2021
0
citations
Early Convolutions Help Transformers See Better
NeurIPS 2021
0
citations
K-LITE: Learning Transferable Visual Models with External Knowledge
NeurIPS 2022
0
citations
Visual Prompting via Image Inpainting
NeurIPS 2022
0
citations
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
NeurIPS 2022
0
citations
Hierarchical Open-vocabulary Universal Image Segmentation
NeurIPS 2023
0
citations
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
NeurIPS 2023
0
citations
Large Language Models are Visual Reasoning Coordinators
NeurIPS 2023
0
citations
Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation
NeurIPS 2023
0
citations
Curiosity-driven Exploration by Self-supervised Prediction
ICML 2017
0
citations
CyCADA: Cycle-Consistent Adversarial Domain Adaptation
ICML 2018
0
citations