Salman Khan
69
Papers
215
Total Citations
Papers (69)
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
CVPR 2024
78
citations
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
CVPR 2024
34
citations
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
CVPR 2025
30
citations
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
ICCV 2025
24
citations
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
CVPR 2024
20
citations
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
CVPR 2025
9
citations
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
ICCV 2025
6
citations
GroupMamba: Efficient Group-Based Visual State Space Model
CVPR 2025
6
citations
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
NeurIPS 2025
5
citations
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
ICCV 2025
2
citations
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
ICCV 2025
1
citations
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
0
citations
Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation
ICML 2024
0
citations
Striking the Right Balance With Uncertainty
CVPR 2019
0
citations
Semi-Supervised Learning for Few-Shot Image-to-Image Translation
CVPR 2020
0
citations
CycleISP: Real Image Restoration via Improved Data Synthesis
CVPR 2020arXiv
0
citations
A Self-supervised Approach for Adversarial Robustness
CVPR 2020arXiv
0
citations
AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces
CVPR 2020arXiv
0
citations
iTAML: An Incremental Task-Agnostic Meta-learning Approach
CVPR 2020arXiv
0
citations
Towards Open World Object Detection
CVPR 2021arXiv
0
citations
Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning
CVPR 2021arXiv
0
citations
Multi-Stage Progressive Image Restoration
CVPR 2021arXiv
0
citations
OW-DETR: Open-World Detection Transformer
CVPR 2022
0
citations
Burst Image Restoration and Enhancement
CVPR 2022arXiv
0
citations
Restormer: Efficient Transformer for High-Resolution Image Restoration
CVPR 2022arXiv
0
citations
Energy-Based Latent Aligner for Incremental Learning
CVPR 2022arXiv
0
citations
Spatio-Temporal Relation Modeling for Few-Shot Action Recognition
CVPR 2022arXiv
0
citations
Self-Supervised Video Transformer
CVPR 2022arXiv
0
citations
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery
CVPR 2023arXiv
0
citations
Burstormer: Burst Image Restoration and Enhancement Transformer
CVPR 2023arXiv
0
citations
Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
CVPR 2023arXiv
0
citations
Person Image Synthesis via Denoising Diffusion Model
CVPR 2023arXiv
0
citations
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
CVPR 2023arXiv
0
citations
MaPLe: Multi-Modal Prompt Learning
CVPR 2023arXiv
0
citations
Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting
CVPR 2023
0
citations
Fine-Tuned CLIP Models Are Efficient Video Learners
CVPR 2023arXiv
0
citations
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement
CVPR 2023arXiv
0
citations
Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks
ICCV 2019
0
citations
Transductive Learning for Zero-Shot Object Detection
ICCV 2019
0
citations
Gaussian Affinity for Max-Margin Class Imbalanced Learning
ICCV 2019
0
citations
Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss
ICCV 2019
0
citations
Orthogonal Projection Loss
ICCV 2021arXiv
0
citations
Discriminative Region-Based Multi-Label Zero-Shot Learning
ICCV 2021arXiv
0
citations
Handwriting Transformers
ICCV 2021arXiv
0
citations
On Generating Transferable Targeted Perturbations
ICCV 2021arXiv
0
citations
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
ICCV 2023arXiv
0
citations
Towards Instance-adaptive Inference for Federated Learning
ICCV 2023arXiv
0
citations
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning
ICCV 2023arXiv
0
citations
Generative Multiplane Neural Radiance for 3D-Aware Image Generation
ICCV 2023arXiv
0
citations
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
ICCV 2023arXiv
0
citations
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
ICCV 2023
0
citations
Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
ICCV 2023arXiv
0
citations
Fixing Localization Errors to Improve Image Classification
ECCV 2020
0
citations
Learning Enriched Features for Real Image Restoration and Enhancement
ECCV 2020
0
citations
Class-Agnostic Object Detection with Multi-modal Transformer
ECCV 2022
0
citations
DoodleFormer: Creative Sketch Drawing with Transformers
ECCV 2022
0
citations
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
ECCV 2022
0
citations
OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning
ECCV 2022
0
citations
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation
ECCV 2022
0
citations
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
CVPR 2025
0
citations
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
CVPR 2025
0
citations
Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
ICCV 2025
0
citations
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
ICCV 2025
0
citations
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
ICCV 2025
0
citations
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
AAAI 2025
0
citations
S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment
AAAI 2024
0
citations
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
CVPR 2024
0
citations
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
CVPR 2024
0
citations
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
CVPR 2024
0
citations