Leonid Karlinsky

30

Papers

249

Total Citations

Papers (30)

Listen, Think, and Understand

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

Sample- and Parameter-Efficient Auto-Regressive Image Models

Teaching VLMs to Localize Specific Objects from In-context Examples

BATCLIP: Bimodal Online Test-Time Adaptation for CLIP

LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content

Fine-Grained Angular Contrastive Learning With Coarse Labels

Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data

Unsupervised Domain Generalization by Learning a Bridge Across Domains

CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning

ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

Teaching Structured Vision & Language Concepts to Vision & Language Models

A Broad Study on the Transferability of Visual Representations With Contrastive Learning

Detector-Free Weakly Supervised Grounding by Separation

Going Beyond Nouns With Vision & Language Models Using Synthetic Data

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification

A Broader Study of Cross-Domain Few-Shot Learning

Self-Supervised Classification Network

Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

NeurIPS 2021arXiv

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

NeurIPS 2022arXiv

FETA: Towards Specializing Foundational Models for Expert Task Applications

NeurIPS 2022arXiv

How Transferable are Video Representations Based on Synthetic Data?

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

NeurIPS 2023arXiv

Learning Human Action Recognition Representations Without Real Humans

NeurIPS 2023arXiv

Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

NeurIPS 2023arXiv