Salman Khan
23
Papers
216
Total Citations
Papers (23)
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
CVPR 2024
78
citations
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
CVPR 2024
34
citations
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
CVPR 2025
30
citations
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
ICCV 2025
24
citations
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
CVPR 2024
20
citations
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
CVPR 2025
9
citations
GroupMamba: Efficient Group-Based Visual State Space Model
CVPR 2025
6
citations
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
ICCV 2025
6
citations
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
NeurIPS 2025
5
citations
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
ICCV 2025
2
citations
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
ICCV 2025
1
citations
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
ICCV 2025arXiv
1
citations
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
0
citations
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
CVPR 2025
0
citations
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
CVPR 2025
0
citations
Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation
ICML 2024
0
citations
Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
ICCV 2025
0
citations
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
ICCV 2025
0
citations
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
AAAI 2025
0
citations
S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment
AAAI 2024
0
citations
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
CVPR 2024
0
citations
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
CVPR 2024
0
citations
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
CVPR 2024
0
citations