Haoqi Fan
24
Papers
15
Total Citations
Papers (24)
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
ICLR 2025
15
citations
Going Deeper into First-Person Activity Recognition
CVPR 2016
0
citations
Stacked Latent Attention for Multimodal Reasoning
CVPR 2018
0
citations
Long-Term Feature Banks for Detailed Video Understanding
CVPR 2019
0
citations
Momentum Contrast for Unsupervised Visual Representation Learning
CVPR 2020arXiv
0
citations
Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories
CVPR 2021arXiv
0
citations
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
CVPR 2021arXiv
0
citations
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
CVPR 2022arXiv
0
citations
Reversible Vision Transformers
CVPR 2022
0
citations
Unified Transformer Tracker for Object Tracking
CVPR 2022arXiv
0
citations
Masked Feature Prediction for Self-Supervised Visual Pre-Training
CVPR 2022arXiv
0
citations
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
CVPR 2022arXiv
0
citations
On the Importance of Asymmetry for Siamese Representation Learning
CVPR 2022arXiv
0
citations
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
CVPR 2023
0
citations
Scaling Language-Image Pre-Training via Masking
CVPR 2023arXiv
0
citations
Order-Aware Generative Modeling Using the 3D-Craft Dataset
ICCV 2019
0
citations
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution
ICCV 2019
0
citations
SlowFast Networks for Video Recognition
ICCV 2019
0
citations
Multiscale Vision Transformers
ICCV 2021arXiv
0
citations
HiT: Hierarchical Transformer With Momentum Contrast for Video-Text Retrieval
ICCV 2021arXiv
0
citations
Multiview Pseudo-Labeling for Semi-Supervised Learning From Video
ICCV 2021arXiv
0
citations
The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining
ICCV 2023arXiv
0
citations
LLaVA-Critic: Learning to Evaluate Multimodal Models
CVPR 2025
0
citations
Diffusion Models as Masked Autoencoders
ICCV 2023arXiv
0
citations