Leonid Karlinsky

30
Papers
249
Total Citations

Papers (30)

Listen, Think, and Understand

ICLR 2024arXiv
221
citations

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

ICLR 2025arXiv
19
citations

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

ICCV 2025
3
citations

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

CVPR 2025arXiv
2
citations

Sample- and Parameter-Efficient Auto-Regressive Image Models

CVPR 2025arXiv
2
citations

Teaching VLMs to Localize Specific Objects from In-context Examples

ICCV 2025arXiv
2
citations

BATCLIP: Bimodal Online Test-Time Adaptation for CLIP

ICCV 2025arXiv
0
citations

LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content

ICLR 2025arXiv
0
citations

Fine-Grained Angular Contrastive Learning With Coarse Labels

CVPR 2021arXiv
0
citations

Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data

CVPR 2022arXiv
0
citations

Unsupervised Domain Generalization by Learning a Bridge Across Domains

CVPR 2022arXiv
0
citations

CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning

CVPR 2023
0
citations

ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

CVPR 2023
0
citations

Teaching Structured Vision & Language Concepts to Vision & Language Models

CVPR 2023
0
citations

A Broad Study on the Transferability of Visual Representations With Contrastive Learning

ICCV 2021arXiv
0
citations

Detector-Free Weakly Supervised Grounding by Separation

ICCV 2021arXiv
0
citations

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

ICCV 2023arXiv
0
citations

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

ICCV 2023arXiv
0
citations

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

ECCV 2020
0
citations

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

ECCV 2020
0
citations

TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification

ECCV 2020
0
citations

A Broader Study of Cross-Domain Few-Shot Learning

ECCV 2020
0
citations

Self-Supervised Classification Network

ECCV 2022
0
citations

Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

NeurIPS 2021arXiv
0
citations

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

NeurIPS 2022arXiv
0
citations

FETA: Towards Specializing Foundational Models for Expert Task Applications

NeurIPS 2022arXiv
0
citations

How Transferable are Video Representations Based on Synthetic Data?

NeurIPS 2022
0
citations

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

NeurIPS 2023arXiv
0
citations

Learning Human Action Recognition Representations Without Real Humans

NeurIPS 2023arXiv
0
citations

Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

NeurIPS 2023arXiv
0
citations