Mubarak Shah

19

Papers

161

Total Citations

Papers (19)

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

M-LLM Based Video Frame Selection for Efficient Video Understanding

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

VidLA: Video-Language Alignment at Scale

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models

FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition

Open Vocabulary Multi-Label Video Classification

ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition

GT-Loc: Unifying When and Where in Images through a Joint Embedding Space

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

CoLLM: A Large Language Model for Composed Image Retrieval

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

Test-Time Retrieval-Augmented Adaptation for Vision-Language Models

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages