Anelia Angelova
25
Papers
279
Total Citations
Papers (25)
On Scaling Up a Multilingual Vision and Language Model
CVPR 2024
254
citations
Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
CVPR 2024
25
citations
Evolving Losses for Unsupervised Video Representation Learning
CVPR 2020arXiv
0
citations
KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects
CVPR 2020arXiv
0
citations
SMURF: Self-Teaching Multi-Frame Unsupervised RAFT With Full-Image Warping
CVPR 2021arXiv
0
citations
Taskology: Utilizing Task Relations at Scale
CVPR 2021arXiv
0
citations
Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers
CVPR 2023arXiv
0
citations
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
CVPR 2023arXiv
0
citations
Evolving Space-Time Neural Architectures for Videos
ICCV 2019
0
citations
Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras
ICCV 2019
0
citations
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
ICCV 2019
0
citations
4D-Net for Learned Multi-Modal Alignment
ICCV 2021arXiv
0
citations
Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval From a Single Image
ICCV 2021arXiv
0
citations
Contrastive Feature Masking Open-Vocabulary Vision Transformer
ICCV 2023arXiv
0
citations
Adversarial Generative Grammars for Human Activity Prediction
ECCV 2020
0
citations
What Matters in Unsupervised Optical Flow
ECCV 2020
0
citations
Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
ECCV 2020
0
citations
AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification
ECCV 2020
0
citations
AssembleNet++: Assembling Modality Representations via Attention Connections - Supplementary Material -
ECCV 2020
0
citations
Video Question Answering with Iterative Video-Text Co-Tokenization
ECCV 2022
0
citations
FindIt: Generalized Localization with Natural Language Queries
ECCV 2022
0
citations
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
CVPR 2025
0
citations
Unsupervised Learning of Depth and Ego-Motion From Monocular Video Using 3D Geometric Constraints
CVPR 2018arXiv
0
citations
TokenLearner: Adaptive Space-Time Tokenization for Videos
NeurIPS 2021
0
citations
Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
ICML 2017
0
citations