Mubarak Shah

21

Papers

188

Total Citations

Papers (21)

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

M-LLM Based Video Frame Selection for Efficient Video Understanding

Curriculum Direct Preference Optimization for Diffusion and Consistency Models

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

VidLA: Video-Language Alignment at Scale

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models

Open Vocabulary Multi-Label Video Classification

FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition

GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers

ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition

GT-Loc: Unifying When and Where in Images through a Joint Embedding Space

From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos

NeurIPS 2025arXiv

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

Test-Time Retrieval-Augmented Adaptation for Vision-Language Models

Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning

CoLLM: A Large Language Model for Composed Image Retrieval