"zero-shot learning" Papers

59 papers found • Page 1 of 2

Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models

Yankai Jiang, Peng Zhang, Donglin Yang et al.

CVPR 2025posterarXiv:2505.02753

AmorLIP: Efficient Language-Image Pretraining via Amortization

Haotian Sun, Yitong Li, Yuchen Zhuang et al.

NeurIPS 2025posterarXiv:2505.18983
2
citations

Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning

Hairui Ren, Fan Tang, He Zhao et al.

CVPR 2025posterarXiv:2504.11930

Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation

Jingmin Zhu, Anqi Zhu, Hossein Rahmani et al.

NeurIPS 2025posterarXiv:2512.11458

Can LLMs Understand Time Series Anomalies?

Zihao Zhou, Rose Yu

ICLR 2025posterarXiv:2410.05440
32
citations

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

ZeMing Gong, Austin Wang, Xiaoliang Huo et al.

ICLR 2025posterarXiv:2405.17537
18
citations

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025posterarXiv:2412.12077
22
citations

CrypticBio: A Large Multimodal Dataset for Visually Confusing Species

Georgiana Manolache, Gerard Schouten, Joaquin Vanschoren

NeurIPS 2025oral

Dense Video Object Captioning from Disjoint Supervision

Xingyi Zhou, Anurag Arnab, Chen Sun et al.

ICLR 2025oralarXiv:2306.11729
7
citations

Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

Qirui Wu, Denys Iliash, Daniel Ritchie et al.

ICCV 2025highlightarXiv:2411.19492
2
citations

HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis

Yuto Nishimura, Takumi Hirose, Masanari Ohi et al.

ICLR 2025posterarXiv:2410.04380
5
citations

InstructHOI: Context-Aware Instruction for Multi-Modal Reasoning in Human-Object Interaction Detection

Jinguo Luo, Weihong Ren, Quanlong Zheng et al.

NeurIPS 2025spotlight

Locality-Aware Zero-Shot Human-Object Interaction Detection

Sanghyun Kim, Deunsol Jung, Minsu Cho

CVPR 2025posterarXiv:2505.19503

MetaOOD: Automatic Selection of OOD Detection Models

Yuehan Qin, Yichi Zhang, Yi Nian et al.

ICLR 2025posterarXiv:2410.03074
16
citations

MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion

Yikun Ma, Yiqing Li, Jiawei Wu et al.

ICCV 2025posterarXiv:2503.17695
1
citations

PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling

Junchao Gong, Siwei Tu, Weidong Yang et al.

ICLR 2025oralarXiv:2410.05805
7
citations

RESAnything: Attribute Prompting for Arbitrary Referring Segmentation

Ruiqi Wang, Hao Zhang

NeurIPS 2025posterarXiv:2505.02867
2
citations

Teaching Human Behavior Improves Content Understanding Abilities Of VLMs

SOMESH SINGH, Harini S I, Yaman Singla et al.

ICLR 2025poster
2
citations

Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding

Zaiquan Yang, Yuhao LIU, Gerhard Hancke et al.

NeurIPS 2025oralarXiv:2509.15178
2
citations

Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning

Huajie Jiang, Zhengxian Li, Xiaohan Yu et al.

CVPR 2025posterarXiv:2503.23030
1
citations

Zero-shot protein stability prediction by inverse folding models: a free energy interpretation

Jes Frellsen, Maher Kassem, Tone Bengtsen et al.

NeurIPS 2025posterarXiv:2506.05596
2
citations

Z-Magic: Zero-shot Multiple Attributes Guided Image Creator

Yingying Deng, Xiangyu He, Fan Tang et al.

CVPR 2025posterarXiv:2503.12124

${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Dingyang Chen, Qi Zhang

ICML 2024poster

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen et al.

ICML 2024oral

A Fixed-Point Approach for Causal Generative Modeling

Meyer Scetbon, Joel Jennings, Agrin Hilmkil et al.

ICML 2024poster

BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind

Yuanyuan Mao, Xin Lin, Qin Ni et al.

AAAI 2024paperarXiv:2402.07402

Chinese Spelling Correction as Rephrasing Language Model

Linfeng Liu, Hongqiu Wu, Hai Zhao

AAAI 2024paperarXiv:2308.08796
29
citations

Commonsense for Zero-Shot Natural Language Video Localization

Meghana Holla, Ismini Lourentzou

AAAI 2024paperarXiv:2312.17429
5
citations

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang, Jing Yu, Keke Gai et al.

AAAI 2024paperarXiv:2309.16137
57
citations

CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models

Dan Shi, Chaobin You, Jian-Tao Huang et al.

AAAI 2024paperarXiv:2312.12853
2
citations

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

Long-Fei Li, Peng Zhao, Zhi-Hua Zhou

AAAI 2024paperarXiv:2407.08787
4
citations

Data-Free Generalized Zero-Shot Learning

Bowen Tang, Jing Zhang, Yan Long et al.

AAAI 2024paperarXiv:2401.15657

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.

ICML 2024poster

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

Jungil Kong, Junmo Lee, Jeongmin Kim et al.

ICML 2024poster

GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Haozhan Shen, Tiancheng Zhao, Mingwei Zhu et al.

AAAI 2024paperarXiv:2312.15043

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

Wangbo Yu, Li Yuan, Yanpei Cao et al.

ECCV 2024posterarXiv:2310.06744
34
citations

Image Captioning with Multi-Context Synthetic Data

AAAI 2024paperarXiv:2305.18072

Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance

Xinyu Peng, Ziyang Zheng, Wenrui Dai et al.

ICML 2024poster

InstructDoc: A Dataset for Zero

Shot Generalization of Visual Document Understanding with Instructions - Ryota Tanaka, Taichi Iki, Kyosuke Nishida et al.

AAAI 2024paperarXiv:2401.13313

Interactive Visual Task Learning for Robots

AAAI 2024paperarXiv:2312.13219

LangCell: Language-Cell Pre-training for Cell Identity Understanding

Suyuan Zhao, Jiahuan Zhang, Yushuai Wu et al.

ICML 2024poster

Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

Yonggan Fu, Huaizhi Qu, Zhifan Ye et al.

ECCV 2024posterarXiv:2403.11131

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

Runyi Li, Xuhan SHENG, Weiqi Li et al.

ECCV 2024posterarXiv:2404.10312
11
citations

PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation

Runze Liu, Yali Du, Fengshuo Bai et al.

ICML 2024poster

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Soroush Nasiriany, Fei Xia, Wenhao Yu et al.

ICML 2024poster

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

Yaoting Wang, Liu Weisong, Guangyao Li et al.

AAAI 2024paperarXiv:2309.07929
38
citations

Revisiting the Role of Language Priors in Vision-Language Models

Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.

ICML 2024poster

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

Paarth Neekhara, Shehzeen Hussain, Rafael Valle et al.

ICML 2024poster

StyleSinger: Style Transfer for Out

of-Domain Singing Voice Synthesis

AAAI 2024paperarXiv:2312.10741

Task Contamination: Language Models May Not Be Few-Shot Anymore

Changmao Li, Jeffrey Flanigan

AAAI 2024paperarXiv:2312.16337
130
citations
← PreviousNext →