Leonid Karlinsky

8

Papers

260

Total Citations

Papers (8)

Listen, Think, and Understand

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

Sample- and Parameter-Efficient Auto-Regressive Image Models

Teaching VLMs to Localize Specific Objects from In-context Examples

BATCLIP: Bimodal Online Test-Time Adaptation for CLIP