Haoqi Fan

24
Papers
15
Total Citations

Papers (24)

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning

ICLR 2025
15
citations

Going Deeper into First-Person Activity Recognition

CVPR 2016
0
citations

Stacked Latent Attention for Multimodal Reasoning

CVPR 2018
0
citations

Long-Term Feature Banks for Detailed Video Understanding

CVPR 2019
0
citations

Momentum Contrast for Unsupervised Visual Representation Learning

CVPR 2020arXiv
0
citations

Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories

CVPR 2021arXiv
0
citations

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

CVPR 2021arXiv
0
citations

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

CVPR 2022arXiv
0
citations

Reversible Vision Transformers

CVPR 2022
0
citations

Unified Transformer Tracker for Object Tracking

CVPR 2022arXiv
0
citations

Masked Feature Prediction for Self-Supervised Visual Pre-Training

CVPR 2022arXiv
0
citations

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

CVPR 2022arXiv
0
citations

On the Importance of Asymmetry for Siamese Representation Learning

CVPR 2022arXiv
0
citations

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

CVPR 2023
0
citations

Scaling Language-Image Pre-Training via Masking

CVPR 2023arXiv
0
citations

Order-Aware Generative Modeling Using the 3D-Craft Dataset

ICCV 2019
0
citations

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution

ICCV 2019
0
citations

SlowFast Networks for Video Recognition

ICCV 2019
0
citations

Multiscale Vision Transformers

ICCV 2021arXiv
0
citations

HiT: Hierarchical Transformer With Momentum Contrast for Video-Text Retrieval

ICCV 2021arXiv
0
citations

Multiview Pseudo-Labeling for Semi-Supervised Learning From Video

ICCV 2021arXiv
0
citations

The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining

ICCV 2023arXiv
0
citations

LLaVA-Critic: Learning to Evaluate Multimodal Models

CVPR 2025
0
citations

Diffusion Models as Masked Autoencoders

ICCV 2023arXiv
0
citations