Christoph Feichtenhofer

37
Papers
961
Total Citations

Papers (37)

Spatiotemporal Residual Networks for Video Action Recognition

NeurIPS 2016arXiv
741
citations

Demystifying CLIP Data

ICLR 2024
205
citations

An Empirical Study of Autoregressive Pre-training from Videos

ICCV 2025
15
citations

Temporal Residual Networks for Dynamic Scene Recognition

CVPR 2017
0
citations

Spatiotemporal Multiplier Networks for Video Action Recognition

CVPR 2017
0
citations

What Have We Learned From Deep Representations for Action Recognition?

CVPR 2018arXiv
0
citations

Long-Term Feature Banks for Detailed Video Understanding

CVPR 2019
0
citations

3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

CVPR 2019
0
citations

A Multigrid Method for Efficiently Training Video Models

CVPR 2020arXiv
0
citations

Ego-Topo: Environment Affordances From Egocentric Video

CVPR 2020
0
citations

X3D: Expanding Architectures for Efficient Video Recognition

CVPR 2020arXiv
0
citations

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

CVPR 2021arXiv
0
citations

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

CVPR 2022arXiv
0
citations

Reversible Vision Transformers

CVPR 2022
0
citations

Masked Feature Prediction for Self-Supervised Visual Pre-Training

CVPR 2022arXiv
0
citations

A ConvNet for the 2020s

CVPR 2022arXiv
0
citations

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

CVPR 2022arXiv
0
citations

Ego4D: Around the World in 3,000 Hours of Egocentric Video

CVPR 2022
0
citations

On the Benefits of 3D Pose and Tracking for Human Action Recognition

CVPR 2023arXiv
0
citations

Scaling Language-Image Pre-Training via Masking

CVPR 2023arXiv
0
citations

Multiview Compressive Coding for 3D Reconstruction

CVPR 2023arXiv
0
citations

Detect to Track and Track to Detect

ICCV 2017arXiv
0
citations

SlowFast Networks for Video Recognition

ICCV 2019
0
citations

Grounded Human-Object Interaction Hotspots From Video

ICCV 2019
0
citations

Multiscale Vision Transformers

ICCV 2021arXiv
0
citations

Multiview Pseudo-Labeling for Semi-Supervised Learning From Video

ICCV 2021arXiv
0
citations

The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining

ICCV 2023arXiv
0
citations

CiT: Curation in Training for Effective Vision-Language Data

ICCV 2023arXiv
0
citations

Diffusion Models as Masked Autoencoders

ICCV 2023arXiv
0
citations

TrackFormer: Multi-Object Tracking With Transformers

CVPR 2022
0
citations

Dynamically Encoded Actions Based on Spacetime Saliency

CVPR 2015
0
citations

Convolutional Two-Stream Network Fusion for Video Action Recognition

CVPR 2016
0
citations

Learning Temporal Pose Estimation from Sparsely-Labeled Videos

NeurIPS 2019
0
citations

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

NeurIPS 2021
0
citations

Masked Autoencoders that Listen

NeurIPS 2022
0
citations

Masked Autoencoders As Spatiotemporal Learners

NeurIPS 2022
0
citations

MAViL: Masked Audio-Video Learners

NeurIPS 2023
0
citations