Juan Carlos Niebles
47
Papers
3,191
Total Citations
2
Affiliations
Affiliations
SalesforceStanford University
Papers (47)
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
CVPR 2015
2,814
citations
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
CVPR 2024
192
citations
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
ICLR 2024
104
citations
End-to-End Joint Semantic Segmentation of Actors and Actions in Video
ECCV 2018
36
citations
Re-thinking Temporal Search for Long-Form Video Understanding
CVPR 2025
36
citations
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
ICCV 2025
4
citations
Taming generative video models for zero-shot optical flow extraction
NeurIPS 2025
3
citations
ViUniT: Visual Unit Tests for More Robust Visual Programming
CVPR 2025
2
citations
Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos
CVPR 2018
0
citations
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
CVPR 2018
0
citations
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
CVPR 2019
0
citations
Peeking Into the Future: Predicting Future Person Activities and Locations in Videos
CVPR 2019
0
citations
Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration
CVPR 2019
0
citations
Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs
CVPR 2020
0
citations
Spatio-Temporal Graph for Video Captioning With Knowledge Distillation
CVPR 2020arXiv
0
citations
Few-Shot Video Classification via Temporal Alignment
CVPR 2020arXiv
0
citations
Metadata Normalization
CVPR 2021arXiv
0
citations
Home Action Genome: Cooperative Compositional Action Understanding
CVPR 2021arXiv
0
citations
Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
CVPR 2022arXiv
0
citations
Revisiting the "Video" in Video-Language Understanding
CVPR 2022
0
citations
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
CVPR 2023
0
citations
Procedure-Aware Pretraining for Instructional Video Understanding
CVPR 2023
0
citations
Mask-Free OVIS: Open-Vocabulary Instance Segmentation Without Manual Mask Annotations
CVPR 2023arXiv
0
citations
Dense-Captioning Events in Videos
ICCV 2017arXiv
0
citations
Visual Forecasting by Imitating Dynamics in Natural Sequences
ICCV 2017arXiv
0
citations
Learning Temporal Action Proposals With Fewer Labels
ICCV 2019
0
citations
Imitation Learning for Human Pose Prediction
ICCV 2019
0
citations
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild
ICCV 2021arXiv
0
citations
Detecting Human-Object Relationships in Videos
ICCV 2021
0
citations
Learning Privacy-Preserving Optics for Human Pose Estimation
ICCV 2021
0
citations
Procedure Planning in Instructional Videos
ECCV 2020
0
citations
RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
ECCV 2020
0
citations
PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens
ECCV 2022
0
citations
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
ECCV 2022
0
citations
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation
ICCV 2023arXiv
0
citations
On the Relationship Between Visual Attributes and Convolutional Networks
CVPR 2015
0
citations
Robust Manhattan Frame Estimation From a Single RGB-D Image
CVPR 2015
0
citations
Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos
CVPR 2016
0
citations
A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets
CVPR 2016
0
citations
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
CVPR 2017arXiv
0
citations
Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization
CVPR 2017arXiv
0
citations
SST: Single-Stream Temporal Action Proposals
CVPR 2017
0
citations
Learning to Decompose and Disentangle Representations for Video Prediction
NeurIPS 2018
0
citations
MOMA: Multi-Object Multi-Actor Activity Parsing
NeurIPS 2021
0
citations
MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing
NeurIPS 2022
0
citations
Temporally Disentangled Representation Learning under Unknown Nonstationarity
NeurIPS 2023
0
citations
UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
NeurIPS 2023
0
citations