2024 "zero-shot learning" Papers
44 papers found
${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning
Dingyang Chen, Qi Zhang
A decoder-only foundation model for time-series forecasting
Abhimanyu Das, Weihao Kong, Rajat Sen et al.
A Fixed-Point Approach for Causal Generative Modeling
Meyer Scetbon, Joel Jennings, Agrin Hilmkil et al.
BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind
Yuanyuan Mao, Xin Lin, Qin Ni et al.
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li, Zhenhua Feng, Tianyang Xu et al.
Chinese Spelling Correction as Rephrasing Language Model
Linfeng Liu, Hongqiu Wu, Hai Zhao
Commonsense for Zero-Shot Natural Language Video Localization
Meghana Holla, Ismini Lourentzou
Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models
Dan Shi, Chaobin You, Jian-Tao Huang et al.
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Long-Fei Li, Peng Zhao, Zhi-Hua Zhou
Data-Free Generalized Zero-Shot Learning
Bowen Tang, Jing Zhang, Yan Long et al.
DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
Zhi Zhou, Ming Yang, Jiang-Xin Shi et al.
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong, Junmo Lee, Jeongmin Kim et al.
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang, Jiaqi Hu, Lianrui Mu et al.
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.
GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen, Tiancheng Zhao, Mingwei Zhu et al.
HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Wangbo Yu, Li Yuan, Yanpei Cao et al.
Image Captioning with Multi-Context Synthetic Data
Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance
Xinyu Peng, Ziyang Zheng, Wenrui Dai et al.
InstructDoc: A Dataset for Zero
Shot Generalization of Visual Document Understanding with Instructions - Ryota Tanaka, Taichi Iki, Kyosuke Nishida et al.
Interactive Visual Task Learning for Robots
LangCell: Language-Cell Pre-training for Cell Identity Understanding
Suyuan Zhao, Jiahuan Zhang, Yushuai Wu et al.
Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
Yonggan Fu, Huaizhi Qu, Zhifan Ye et al.
OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
Runyi Li, Xuhan SHENG, Weiqi Li et al.
PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation
Runze Liu, Yali Du, Fengshuo Bai et al.
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany, Fei Xia, Wenhao Yu et al.
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Hitesh Kandala, Jianfeng Gao, Jianwei Yang
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang, Liu Weisong, Guangyao Li et al.
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.
Recursive Visual Programming
Jiaxin Ge, Sanjay Subramanian, Baifeng Shi et al.
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara, Shehzeen Hussain, Rafael Valle et al.
StyleSinger: Style Transfer for Out
of-Domain Singing Voice Synthesis
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li, Jeffrey Flanigan
Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers
Johann Schmidt, Sebastian Stober
Towards Causal Foundation Model: on Duality between Optimal Balancing and Attention
Jiaqi Zhang, Joel Jennings, Agrin Hilmkil et al.
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng, Xinhao Cai, Qingchao Chen et al.
Video Question Answering with Procedural Programs
Rohan Choudhury, Koichiro Niinuma, Kris Kitani et al.
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
yunxin li, Baotian Hu, Haoyuan Shi et al.
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li, Haopeng Li, Sarah Erfani et al.
Wikiformer: Pre-training with Structured Information of Wikipedia for Ad-Hoc Retrieval
Weihang Su, Qingyao Ai, Xiangsheng Li et al.
Zero-1-to-3: Domain-Level Zero-Shot Cognitive Diagnosis via One Batch of Early-Bird Students towards Three Diagnostic Objectives
Weibo Gao, Qi Liu, Hao Wang et al.
Zero-Shot Image Feature Consensus with Deep Functional Maps
Xinle Cheng, Congyue Deng, Adam Harley et al.
Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
Soyeong Kwon, TAEGYEONG LEE, Taehwan Kim