Juan Carlos Niebles

47
Papers
3,191
Total Citations
2
Affiliations

Affiliations

SalesforceStanford University

Papers (47)

ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding

CVPR 2015
2,814
citations

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

CVPR 2024
192
citations

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

ICLR 2024
104
citations

End-to-End Joint Semantic Segmentation of Actors and Actions in Video

ECCV 2018
36
citations

Re-thinking Temporal Search for Long-Form Video Understanding

CVPR 2025
36
citations

UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation

ICCV 2025
4
citations

Taming generative video models for zero-shot optical flow extraction

NeurIPS 2025
3
citations

ViUniT: Visual Unit Tests for More Robust Visual Programming

CVPR 2025
2
citations

Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

CVPR 2018
0
citations

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets

CVPR 2018
0
citations

D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

CVPR 2019
0
citations

Peeking Into the Future: Predicting Future Person Activities and Locations in Videos

CVPR 2019
0
citations

Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration

CVPR 2019
0
citations

Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs

CVPR 2020
0
citations

Spatio-Temporal Graph for Video Captioning With Knowledge Distillation

CVPR 2020arXiv
0
citations

Few-Shot Video Classification via Temporal Alignment

CVPR 2020arXiv
0
citations

Metadata Normalization

CVPR 2021arXiv
0
citations

Home Action Genome: Cooperative Compositional Action Understanding

CVPR 2021arXiv
0
citations

Align and Prompt: Video-and-Language Pre-Training With Entity Prompts

CVPR 2022arXiv
0
citations

Revisiting the "Video" in Video-Language Understanding

CVPR 2022
0
citations

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

CVPR 2023
0
citations

Procedure-Aware Pretraining for Instructional Video Understanding

CVPR 2023
0
citations

Mask-Free OVIS: Open-Vocabulary Instance Segmentation Without Manual Mask Annotations

CVPR 2023arXiv
0
citations

Dense-Captioning Events in Videos

ICCV 2017arXiv
0
citations

Visual Forecasting by Imitating Dynamics in Natural Sequences

ICCV 2017arXiv
0
citations

Learning Temporal Action Proposals With Fewer Labels

ICCV 2019
0
citations

Imitation Learning for Human Pose Prediction

ICCV 2019
0
citations

TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild

ICCV 2021arXiv
0
citations

Detecting Human-Object Relationships in Videos

ICCV 2021
0
citations

Learning Privacy-Preserving Optics for Human Pose Estimation

ICCV 2021
0
citations

Procedure Planning in Instructional Videos

ECCV 2020
0
citations

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition

ECCV 2020
0
citations

PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens

ECCV 2022
0
citations

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

ECCV 2022
0
citations

Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation

ICCV 2023arXiv
0
citations

On the Relationship Between Visual Attributes and Convolutional Networks

CVPR 2015
0
citations

Robust Manhattan Frame Estimation From a Single RGB-D Image

CVPR 2015
0
citations

Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos

CVPR 2016
0
citations

A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets

CVPR 2016
0
citations

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

CVPR 2017arXiv
0
citations

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

CVPR 2017arXiv
0
citations

SST: Single-Stream Temporal Action Proposals

CVPR 2017
0
citations

Learning to Decompose and Disentangle Representations for Video Prediction

NeurIPS 2018
0
citations

MOMA: Multi-Object Multi-Actor Activity Parsing

NeurIPS 2021
0
citations

MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing

NeurIPS 2022
0
citations

Temporally Disentangled Representation Learning under Unknown Nonstationarity

NeurIPS 2023
0
citations

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

NeurIPS 2023
0
citations