Salman Khan

69
Papers
215
Total Citations

Papers (69)

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery

CVPR 2024
78
citations

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

CVPR 2024
34
citations

VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

CVPR 2025
30
citations

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

ICCV 2025
24
citations

Composed Video Retrieval via Enriched Context and Discriminative Embeddings

CVPR 2024
20
citations

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models

CVPR 2025
9
citations

AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs

ICCV 2025
6
citations

GroupMamba: Efficient Group-Based Visual State Space Model

CVPR 2025
6
citations

MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks

NeurIPS 2025
5
citations

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

ICCV 2025
2
citations

Beyond Simple Edits: Composed Video Retrieval with Dense Modifications

ICCV 2025
1
citations

GLaMM: Pixel Grounding Large Multimodal Model

CVPR 2024
0
citations

Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation

ICML 2024
0
citations

Striking the Right Balance With Uncertainty

CVPR 2019
0
citations

Semi-Supervised Learning for Few-Shot Image-to-Image Translation

CVPR 2020
0
citations

CycleISP: Real Image Restoration via Improved Data Synthesis

CVPR 2020arXiv
0
citations

A Self-supervised Approach for Adversarial Robustness

CVPR 2020arXiv
0
citations

AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces

CVPR 2020arXiv
0
citations

iTAML: An Incremental Task-Agnostic Meta-learning Approach

CVPR 2020arXiv
0
citations

Towards Open World Object Detection

CVPR 2021arXiv
0
citations

Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning

CVPR 2021arXiv
0
citations

Multi-Stage Progressive Image Restoration

CVPR 2021arXiv
0
citations

OW-DETR: Open-World Detection Transformer

CVPR 2022
0
citations

Burst Image Restoration and Enhancement

CVPR 2022arXiv
0
citations

Restormer: Efficient Transformer for High-Resolution Image Restoration

CVPR 2022arXiv
0
citations

Energy-Based Latent Aligner for Incremental Learning

CVPR 2022arXiv
0
citations

Spatio-Temporal Relation Modeling for Few-Shot Action Recognition

CVPR 2022arXiv
0
citations

Self-Supervised Video Transformer

CVPR 2022arXiv
0
citations

PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

CVPR 2023arXiv
0
citations

Burstormer: Burst Image Restoration and Enhancement Transformer

CVPR 2023arXiv
0
citations

Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection

CVPR 2023arXiv
0
citations

Person Image Synthesis via Denoising Diffusion Model

CVPR 2023arXiv
0
citations

Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection

CVPR 2023arXiv
0
citations

MaPLe: Multi-Modal Prompt Learning

CVPR 2023arXiv
0
citations

Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting

CVPR 2023
0
citations

Fine-Tuned CLIP Models Are Efficient Video Learners

CVPR 2023arXiv
0
citations

Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement

CVPR 2023arXiv
0
citations

Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks

ICCV 2019
0
citations

Transductive Learning for Zero-Shot Object Detection

ICCV 2019
0
citations

Gaussian Affinity for Max-Margin Class Imbalanced Learning

ICCV 2019
0
citations

Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss

ICCV 2019
0
citations

Orthogonal Projection Loss

ICCV 2021arXiv
0
citations

Discriminative Region-Based Multi-Label Zero-Shot Learning

ICCV 2021arXiv
0
citations

Handwriting Transformers

ICCV 2021arXiv
0
citations

On Generating Transferable Targeted Perturbations

ICCV 2021arXiv
0
citations

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

ICCV 2023arXiv
0
citations

Towards Instance-adaptive Inference for Federated Learning

ICCV 2023arXiv
0
citations

Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning

ICCV 2023arXiv
0
citations

Generative Multiplane Neural Radiance for 3D-Aware Image Generation

ICCV 2023arXiv
0
citations

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

ICCV 2023arXiv
0
citations

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

ICCV 2023
0
citations

Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

ICCV 2023arXiv
0
citations

Fixing Localization Errors to Improve Image Classification

ECCV 2020
0
citations

Learning Enriched Features for Real Image Restoration and Enhancement

ECCV 2020
0
citations

Class-Agnostic Object Detection with Multi-modal Transformer

ECCV 2022
0
citations

DoodleFormer: Creative Sketch Drawing with Transformers

ECCV 2022
0
citations

Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer

ECCV 2022
0
citations

OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning

ECCV 2022
0
citations

Learning Disentanglement with Decoupled Labels for Vision-Language Navigation

ECCV 2022
0
citations

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

CVPR 2025
0
citations

EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues

CVPR 2025
0
citations

Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model

ICCV 2025
0
citations

LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation

ICCV 2025
0
citations

Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation

ICCV 2025
0
citations

VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

AAAI 2025
0
citations

S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment

AAAI 2024
0
citations

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

CVPR 2024
0
citations

VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

CVPR 2024
0
citations

Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

CVPR 2024
0
citations