Anelia Angelova

25

Papers

279

Total Citations

Papers (25)

On Scaling Up a Multilingual Vision and Language Model

Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities

Evolving Losses for Unsupervised Video Representation Learning

KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects

SMURF: Self-Teaching Multi-Frame Unsupervised RAFT With Full-Image Warping

Taskology: Utilizing Task Relations at Scale

Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

Evolving Space-Time Neural Architectures for Videos

Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

4D-Net for Learned Multi-Modal Alignment

Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval From a Single Image

Contrastive Feature Masking Open-Vocabulary Vision Transformer

Adversarial Generative Grammars for Human Activity Prediction

What Matters in Unsupervised Optical Flow

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification

AssembleNet++: Assembling Modality Representations via Attention Connections - Supplementary Material -

Video Question Answering with Iterative Video-Text Co-Tokenization

FindIt: Generalized Localization with Natural Language Queries

VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

Unsupervised Learning of Depth and Ego-Motion From Monocular Video Using 3D Geometric Constraints

TokenLearner: Adaptive Space-Time Tokenization for Videos

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs