AJ Piergiovanni
17
Papers
279
Total Citations
Papers (17)
On Scaling Up a Multilingual Vision and Language Model
CVPR 2024
254
citations
Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
CVPR 2024
25
citations
Representation Flow for Action Recognition
CVPR 2019
0
citations
Evolving Losses for Unsupervised Video Representation Learning
CVPR 2020arXiv
0
citations
Recognizing Actions in Videos From Unseen Viewpoints
CVPR 2021arXiv
0
citations
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
CVPR 2023arXiv
0
citations
Evolving Space-Time Neural Architectures for Videos
ICCV 2019
0
citations
4D-Net for Learned Multi-Modal Alignment
ICCV 2021arXiv
0
citations
Adversarial Generative Grammars for Human Activity Prediction
ECCV 2020
0
citations
AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification
ECCV 2020
0
citations
AssembleNet++: Assembling Modality Representations via Attention Connections - Supplementary Material -
ECCV 2020
0
citations
Video Question Answering with Iterative Video-Text Co-Tokenization
ECCV 2022
0
citations
FindIt: Generalized Localization with Natural Language Queries
ECCV 2022
0
citations
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
CVPR 2025
0
citations
Learning Latent Super-Events to Detect Multiple Activities in Videos
CVPR 2018arXiv
0
citations
AViD Dataset: Anonymized Videos from Diverse Countries
NeurIPS 2020
0
citations
TokenLearner: Adaptive Space-Time Tokenization for Videos
NeurIPS 2021
0
citations