Li Fei-Fei
68
Papers
595
Total Citations
Papers (68)
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
CVPR 2025
342
citations
Learning Semantic Relationships for Better Action Retrieval in Images
CVPR 2015
114
citations
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
CVPR 2024
85
citations
Re-thinking Temporal Search for Long-Form Video Understanding
CVPR 2025
36
citations
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
CVPR 2024
14
citations
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
ICCV 2025
4
citations
Image Retrieval Using Scene Graphs
CVPR 2015
0
citations
Fine-Grained Recognition Without Part Annotations
CVPR 2015
0
citations
Social LSTM: Human Trajectory Prediction in Crowded Spaces
CVPR 2016
0
citations
Recurrent Attention Models for Depth-Based Person Identification
CVPR 2016
0
citations
End-To-End Learning of Action Detection From Frame Glimpses in Videos
CVPR 2016
0
citations
Detecting Events and Key Actors in Multi-Person Videos
CVPR 2016
0
citations
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
CVPR 2016
0
citations
Visual7W: Grounded Question Answering in Images
CVPR 2016
0
citations
A Hierarchical Approach for Generating Descriptive Image Paragraphs
CVPR 2017arXiv
0
citations
Knowledge Acquisition for Visual Question Answering via Iterative Querying
CVPR 2017
0
citations
Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals
CVPR 2017
0
citations
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
CVPR 2017arXiv
0
citations
Unsupervised Learning of Long-Term Motion Dynamics for Videos
CVPR 2017arXiv
0
citations
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
CVPR 2017arXiv
0
citations
Learning to Learn From Noisy Web Videos
CVPR 2017arXiv
0
citations
Scene Graph Generation by Iterative Message Passing
CVPR 2017arXiv
0
citations
Image Generation From Scene Graphs
CVPR 2018arXiv
0
citations
Social GAN: Socially Acceptable Trajectories With Generative Adversarial Networks
CVPR 2018arXiv
0
citations
Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos
CVPR 2018
0
citations
Referring Relationships
CVPR 2018arXiv
0
citations
Iterative Visual Reasoning Beyond Convolutions
CVPR 2018arXiv
0
citations
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
CVPR 2018
0
citations
Thoracic Disease Identification and Localization With Limited Supervision
CVPR 2018arXiv
0
citations
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
CVPR 2019
0
citations
Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
CVPR 2019
0
citations
Information Maximizing Visual Question Generation
CVPR 2019
0
citations
DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
CVPR 2019
0
citations
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
CVPR 2019
0
citations
Peeking Into the Future: Predicting Future Person Activities and Locations in Videos
CVPR 2019
0
citations
Composing Text and Image for Image Retrieval - an Empirical Odyssey
CVPR 2019
0
citations
Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration
CVPR 2019
0
citations
Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs
CVPR 2020
0
citations
Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
CVPR 2021
0
citations
Metadata Normalization
CVPR 2021arXiv
0
citations
Scalable Differential Privacy With Sparse Network Finetuning
CVPR 2021
0
citations
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
CVPR 2022
0
citations
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
CVPR 2022arXiv
0
citations
Revisiting the "Video" in Video-Language Understanding
CVPR 2022
0
citations
The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects
CVPR 2023
0
citations
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
CVPR 2025
0
citations
RGB-W: When Vision Meets Wireless
ICCV 2015
0
citations
Learning Temporal Embeddings for Complex Video Analysis
ICCV 2015
0
citations
Love Thy Neighbors: Image Annotation by Exploiting Image Metadata
ICCV 2015
0
citations
Visual Semantic Planning Using Deep Successor Representations
ICCV 2017arXiv
0
citations
Dense-Captioning Events in Videos
ICCV 2017arXiv
0
citations
Fine-Grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach
ICCV 2017arXiv
0
citations
Inferring and Executing Programs for Visual Reasoning
ICCV 2017arXiv
0
citations
Characterizing and Improving Stability in Neural Style Transfer
ICCV 2017arXiv
0
citations
Scene Graph Prediction With Limited Labels
ICCV 2019
0
citations
Situational Fusion of Visual Representation for Visual Navigation
ICCV 2019
0
citations
Rendering Humans from Object-Occluded Monocular Videos
ICCV 2023arXiv
0
citations
Procedure Planning in Instructional Videos
ECCV 2020
0
citations
RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
ECCV 2020
0
citations
PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens
ECCV 2022
0
citations
Improving Image Classification With Location Context
ICCV 2015
0
citations
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
ICCV 2025
0
citations
WorldScore: Unified Evaluation Benchmark for World Generation
ICCV 2025
0
citations
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
ICML 2024
0
citations
Best of Both Worlds: Human-Machine Collaboration for Object Annotation
CVPR 2015
0
citations
Deep Visual-Semantic Alignments for Generating Image Descriptions
CVPR 2015
0
citations
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
ICML 2018
0
citations
Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go?
ICML 2018
0
citations