Trevor Darrell

33
Papers
1,069
Total Citations

Papers (33)

Sequential Modeling Enables Scalable Learning for Large Vision Models

CVPR 2024
230
citations

Compositional Chain-of-Thought Prompting for Large Multimodal Models

CVPR 2024
167
citations

Navigation World Models

CVPR 2025arXiv
136
citations

Self-correcting LLM-controlled Diffusion Models

CVPR 2024
95
citations

LLM-grounded Video Diffusion Models

ICLR 2024
76
citations

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

CVPR 2024
71
citations

When Do We Not Need Larger Vision Models?

ECCV 2024
70
citations

Describing Differences in Image Sets with Natural Language

CVPR 2024
51
citations

Describe Anything: Detailed Localized Image and Video Captioning

ICCV 2025
49
citations

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

CVPR 2024
36
citations

Pre-training Auto-regressive Robotic Models with 4D Representations

ICML 2025
19
citations

PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor

CVPR 2024
17
citations

VisionArena: 230k Real World User-VLM Conversations with Preference Labels

CVPR 2025
12
citations

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

NeurIPS 2025arXiv
10
citations

Recursive Visual Programming

ECCV 2024
10
citations

Vision-Language Models Create Cross-Modal Task Representations

ICML 2025
7
citations

Dual-Process Image Generation

ICCV 2025
6
citations

LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery

NeurIPS 2025
4
citations

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

ICCV 2025
3
citations

Stochastic positional embeddings improve masked image modeling

ICML 2024
0
citations

Scaling Vision Pre-Training to 4K Resolution

CVPR 2025
0
citations

Visual Lexicon: Rich Image Features in Language Space

CVPR 2025
0
citations

Pose Priors from Language Models

CVPR 2025
0
citations

AutoPresent: Designing Structured Visuals from Scratch

CVPR 2025
0
citations

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

ICCV 2025
0
citations

Discovering Divergent Representations between Text-to-Image Models

ICCV 2025
0
citations

InstanceDiffusion: Instance-level Control for Image Generation

CVPR 2024
0
citations

See Say and Segment: Teaching LMMs to Overcome False Premises

CVPR 2024
0
citations

Unsupervised Universal Image Segmentation

CVPR 2024
0
citations

Readout Guidance: Learning Control from Diffusion Features

CVPR 2024
0
citations

Hyperbolic Active Learning for Semantic Segmentation under Domain Shift

ICML 2024
0
citations

xT: Nested Tokenization for Larger Context in Large Images

ICML 2024
0
citations

Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI

ICML 2024
0
citations