Fahad Shahbaz Khan

17

Papers

328

Total Citations

Papers (17)

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

Semi-supervised Open-World Object Detection

GroupMamba: Efficient Group-Based Visual State Space Model

VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

GLaMM: Pixel Grounding Large Multimodal Model

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models

EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues

S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment