🧬Learning Paradigms

Multi-Task Learning

Learning multiple tasks jointly

100 papers3,035 total citations
Compare with other topics
Feb '24 Jan '26706 papers
Also includes: multi-task learning, multitask, joint learning, auxiliary tasks

Top Papers

#1

Sequential Modeling Enables Scalable Learning for Large Vision Models

Yutong Bai, Xinyang Geng, Karttikeya Mangalam et al.

CVPR 2024
230
citations
#2

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NeurIPS 2025arXiv:2503.13657
multi-agent llm systemsfailure pattern analysissystem failure taxonomyllm-as-a-judge+3
188
citations
#3

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.

ICLR 2025arXiv:2410.10819
kv cache pruninglong-context inferenceattention headsretrieval heads+4
165
citations
#4

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Cong Wei, Yang Chen, Haonan Chen et al.

ECCV 2024
127
citations
#5

Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks

MohammadReza Davari, Eugene Belilovsky

ECCV 2024
98
citations
#6

MLVU: Benchmarking Multi-task Long Video Understanding

Junjie Zhou, Yan Shu, Bo Zhao et al.

CVPR 2025
89
citations
#7

Reliable Conflictive Multi-View Learning

Cai Xu, Jiajun Si, Ziyu Guan et al.

AAAI 2024arXiv:2402.16897
multi-view learningconflictive instancesevidential learningopinion aggregation+2
88
citations
#8

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Jingfeng Wu, Difan Zou, Zixiang Chen et al.

ICLR 2024
85
citations
#9

VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

Yi Xin, Junlong Du, Qiang Wang et al.

AAAI 2024arXiv:2312.08733
parameter-efficient transfer learningmulti-task adaptationdense scene understandingvision transformer adapter+2
82
citations
#10

MMTEB: Massive Multilingual Text Embedding Benchmark

Kenneth Enevoldsen, Isaac Chung, Imene Kerboua et al.

ICLR 2025arXiv:2502.13595
text embedding evaluationmultilingual benchmarksinstruction following taskslong-document retrieval+4
74
citations
#11

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Zhixuan Liang, Yao Mu, Hengbo Ma et al.

CVPR 2024
64
citations
#12

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

Yunlong Zhang, Honglin Li, YUXUAN SUN et al.

ECCV 2024
61
citations
#13

HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning

Xingtong Yu, Yuan Fang, Zemin Liu et al.

AAAI 2024arXiv:2312.01878
graph neural networksheterogeneous graph representationfew-shot learningprompt learning+4
59
citations
#14

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Jianhong Bai, Menghan Xia, Xintao WANG et al.

ICLR 2025
55
citations
#15

Task Singular Vectors: Reducing Task Interference in Model Merging

Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli et al.

CVPR 2025
53
citations
#16

TabM: Advancing tabular deep learning with parameter-efficient ensembling

Yury Gorishniy, Akim Kotelnikov, Artem Babenko

ICLR 2025
45
citations
#17

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

Matthew Chang, Gunjan Chhablani, Alexander Clegg et al.

ICLR 2025
44
citations
#18

XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Pritam Sarkar, Ali Etemad

AAAI 2024arXiv:2211.13929
cross-modal knowledge distillationmasked data reconstructiondomain alignment strategyvideo representation learning+4
38
citations
#19

STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation

Liangcai Su, Junwei Pan, Ximei Wang et al.

AAAI 2024arXiv:2308.13537
multi-task learningrecommender systemsnegative transfershared embeddings+3
37
citations
#20

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction

Jiatong Shi, Hirofumi Inaguma, Xutai Ma et al.

ICLR 2024
36
citations
#21

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.

ICLR 2025
33
citations
#22

Jointly Training Large Autoregressive Multimodal Models

Emanuele Aiello, Lili Yu, Yixin Nie et al.

ICLR 2024
32
citations
#23

Graph-Aware Contrasting for Multivariate Time-Series Classification

Yucheng Wang, Yuecong Xu, Jianfei Yang et al.

AAAI 2024arXiv:2309.05202
contrastive learningmultivariate time seriesgraph augmentationsspatial consistency+4
32
citations
#24

What to align in multimodal contrastive learning?

Benoit Dufumier, Javiera Castillo Navarro, Devis Tuia et al.

ICLR 2025
30
citations
#25

UMIE: Unified Multimodal Information Extraction with Instruction Tuning

Lin Sun, Kai Zhang, Qingyuan Li et al.

AAAI 2024arXiv:2401.03082
multimodal information extractioninstruction tuningunified modelgeneration problem+3
29
citations
#26

LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.

AAAI 2024arXiv:2312.08212
prompt tuningvisual-language modelslabel alignmentfew-shot learning+3
28
citations
#27

Progressive Pretext Task Learning for Human Trajectory Prediction

Xiaotong Lin, Tianming Liang, Jian-Huang Lai et al.

ECCV 2024
27
citations
#28

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

Daniel Marczak, Simone Magistri, Sebastian Cygert et al.

ICML 2025
27
citations
#29

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

Jusheng Zhang, Zimeng Huang, Yijia Fan et al.

ICML 2025
26
citations
#30

eTag: Class-Incremental Learning via Embedding Distillation and Task-Oriented Generation

Libo Huang, Yan Zeng, Chuanguang Yang et al.

AAAI 2024
26
citations
#31

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

Baijiong Lin, Weisen Jiang, Pengguang Chen et al.

ECCV 2024
25
citations
#32

DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

Wenhui Zhu, Xiwen Chen, Peijie Qiu et al.

ECCV 2024arXiv:2407.03575
multiple instance learningwhole slide image classificationattention mechanismdiversity modeling+3
24
citations
#33

Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment

Ziyu Shan, Yujie Zhang, Qi Yang et al.

CVPR 2024
24
citations
#34

Training-Free Pretrained Model Merging

Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.

CVPR 2024
24
citations
#35

Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking

Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu

CVPR 2024
24
citations
#36

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking

Yaozong Zheng, Bineng Zhong, Qihua Liang et al.

AAAI 2025
24
citations
#37

MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions

Jian Wu, Linyi Yang, Dongyuan Li et al.

ICLR 2025
tabular data understandingmulti-table question answeringtext-to-sql generationmulti-hop reasoning+4
23
citations
#38

LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

Ke Wang, Nikos Dimitriadis, Alessandro Favero et al.

ICLR 2025
23
citations
#39

Robust Tracking via Mamba-based Context-aware Token Learning

Jinxia Xie, Bineng Zhong, Qihua Liang et al.

AAAI 2025
22
citations
#40

Category-Level Multi-Part Multi-Joint 3D Shape Assembly

Yichen Li, Kaichun Mo, Yueqi Duan et al.

CVPR 2024
22
citations
#41

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.

AAAI 2024arXiv:2304.10520
masked image modelingmasked autoencodersinstance discriminationcontrastive tuning+4
21
citations
#42

TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data

Siyi Du, Shaoming Zheng, Yinsong Wang et al.

ECCV 2024
21
citations
#43

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

Ruoxuan Feng, Jiangyu Hu, Wenke Xia et al.

ICLR 2025
21
citations
#44

GOAL: A Generalist Combinatorial Optimization Agent Learner

Darko Drakulić, Sofia Michel, Jean-Marc Andreoli

ICLR 2025
20
citations
#45

Improving Plasticity in Online Continual Learning via Collaborative Learning

Maorong Wang, Nicolas Michel, Ling Xiao et al.

CVPR 2024
20
citations
#46

HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

Zhongyu Xia, ZhiWei Lin, Xinhao Wang et al.

ECCV 2024arXiv:2404.02517
multi-view cameras3d object detectionbird's-eye-view segmentationtemporal feature integration+4
19
citations
#47

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

ziang yan, Zhilin Li, Yinan He et al.

CVPR 2025arXiv:2412.19326
multimodal large language modelsvision task alignmenttask preference optimizationdifferentiable task preferences+3
19
citations
#48

Task-driven Image Fusion with Learnable Fusion Loss

Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.

CVPR 2025
19
citations
#49

Class Incremental Learning via Likelihood Ratio Based Task Prediction

Haowei Lin, Yijia Shao, Weinan Qian et al.

ICLR 2024
18
citations
#50

TIME-FS: Joint Learning of Tensorial Incomplete Multi-View Unsupervised Feature Selection and Missing-View Imputation

Yanyong Huang, Minghui Lu, Wei Huang et al.

AAAI 2025
18
citations
#51

Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration

Ziheng Zhou, Jinxing Zhou, Wei Qian et al.

AAAI 2025
18
citations
#52

Cloud-Device Collaborative Learning for Multimodal Large Language Models

Guanqun Wang, Jiaming Liu, Chenxuan Li et al.

CVPR 2024
18
citations
#53

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

Anke Tang, Enneng Yang, Li Shen et al.

NeurIPS 2025
18
citations
#54

T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory

Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon et al.

CVPR 2024
18
citations
#55

BCLNet: Bilateral Consensus Learning for Two-View Correspondence Pruning

Xiangyang Miao, Guobao Xiao, Shiping Wang et al.

AAAI 2024arXiv:2401.03459
correspondence pruningcamera pose estimationself-attention blocktwo-view geometry+2
18
citations
#56

MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation

Donggon Jang, Yucheol Cho, Suin Lee et al.

ICLR 2025
17
citations
#57

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025
17
citations
#58

Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning

Kai Jiang, Zhengyan Shi, Dell Zhang et al.

NeurIPS 2025
16
citations
#59

Three Heads Are Better than One: Complementary Experts for Long-Tailed Semi-supervised Learning

Chengcheng Ma, Ismail Elezi, Jiankang Deng et al.

AAAI 2024arXiv:2312.15702
long-tailed learningsemi-supervised learningpseudo-label generationclass distribution mismatch+3
16
citations
#60

Every Node Is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph Clustering

Pengfei Zhu, Qian Wang, Yu Wang et al.

AAAI 2024arXiv:2401.06595
attributed graph clusteringself-supervised learningmulti-task learningdynamic task weighting+3
16
citations
#61

DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

Jincen Jiang, Qianyu Zhou, Yuhang Li et al.

ECCV 2024arXiv:2407.08801
domain generalizationpoint cloud understandingin-context learningpoint cloud reconstruction+4
15
citations
#62

Quad Bayer Joint Demosaicing and Denoising Based on Dual Encoder Network with Joint Residual Learning

Bolun Zheng, Li Haoran, Quan Chen et al.

AAAI 2024
15
citations
#63

Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

Alireza Ganjdanesh, Shangqian Gao, Heng Huang

CVPR 2024
15
citations
#64

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

Yingsen Zeng, Yujie Zhong, Chengjian Feng et al.

ECCV 2024
14
citations
#65

Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

Rui Gong, Weide Liu, ZAIWANG GU et al.

CVPR 2024
14
citations
#66

Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes

Zhiyuan Yu, Zheng Qin, lintao zheng et al.

CVPR 2024
14
citations
#67

A Second-Order Perspective on Model Compositionality and Incremental Learning

Angelo Porrello, Lorenzo Bonicelli, Pietro Buzzega et al.

ICLR 2025
14
citations
#68

Tri-Modal Motion Retrieval by Learning a Joint Embedding Space

Kangning Yin, Shihao Zou, Yuxuan Ge et al.

CVPR 2024
14
citations
#69

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning

Yaming Yang, Dilxat Muhtar, Yelong Shen et al.

AAAI 2025
13
citations
#70

Multi-Label Cluster Discrimination for Visual Representation Learning

Xiang An, Kaicheng Yang, Xiangzi Dai et al.

ECCV 2024arXiv:2407.17331
contrastive language image pre-trainingimage-text contrastive learningcluster discriminationmulti-label classification+3
12
citations
#71

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

Shehreen Azad, Vibhav Vineet, Yogesh S. Rawat

CVPR 2025
12
citations
#72

Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration

Zhixuan Shen, Haonan Luo, Kexun Chen et al.

AAAI 2025
12
citations
#73

Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

Jinluan Yang, Anke Tang, Didi Zhu et al.

ICLR 2025
12
citations
#74

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

Chuanguang Yang, XinQiang Yu, Han Yang et al.

AAAI 2025
12
citations
#75

Training-Free Model Merging for Multi-target Domain Adaptation

Wenyi Li, Huan-ang Gao, Mingju Gao et al.

ECCV 2024
11
citations
#76

Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning

Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto

CVPR 2025
11
citations
#77

Quantifying Task Priority for Multi-Task Optimization

Wooseong Jeong, Kuk-Jin Yoon

CVPR 2024
11
citations
#78

RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception

Shen Jianbing, Chunliang Li, Wencheng Han et al.

ECCV 2024
10
citations
#79

Let All Be Whitened: Multi-Teacher Distillation for Efficient Visual Retrieval

Zhe Ma, Jianfeng Dong, Shouling Ji et al.

AAAI 2024arXiv:2312.09716
visual retrievalmulti-teacher distillationknowledge distillationmodel efficiency+3
10
citations
#80

SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining

Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon et al.

CVPR 2024
10
citations
#81

Data-Efficient Multimodal Fusion on a Single GPU

Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti et al.

CVPR 2024
10
citations
#82

FedLPS: Heterogeneous Federated Learning for Multiple Tasks with Local Parameter Sharing

Yongzhe Jia, Xuyun Zhang, Amin Beheshti et al.

AAAI 2024arXiv:2402.08578
federated learningedge computingparameter sharingmodel pruning+4
10
citations
#83

PARSAC: Accelerating Robust Multi-Model Fitting with Parallel Sample Consensus

Florian Kluger, Bodo Rosenhahn

AAAI 2024arXiv:2401.14919
multi-model fittinggeometric model estimationvanishing point detectionplanar homography estimation+4
10
citations
#84

Not All Tasks Are Equally Difficult: Multi-Task Deep Reinforcement Learning with Dynamic Depth Routing

Jinmin He, Kai Li, Yifan Zang et al.

AAAI 2024arXiv:2312.14472
multi-task reinforcement learningdynamic depth routingparameter sharingrouting network+3
10
citations
#85

Pareto Set Learning for Multi-Objective Reinforcement Learning

Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.

AAAI 2025
10
citations
#86

TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception

Zhiying Song, Lei Yang, Fuxi Wen et al.

CVPR 2025
9
citations
#87

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

Shilin Xu, Haobo Yuan, Qingyu Shi et al.

ICLR 2025
9
citations
#88

ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning

Hongshu Guo, Zeyuan Ma, Jiacheng Chen et al.

AAAI 2025
9
citations
#89

Live and Learn: Continual Action Clustering with Incremental Views

Xiaoqiang Yan, Yingtao Gan, Yiqiao Mao et al.

AAAI 2024arXiv:2404.07962
multi-view action clusteringcontinual learningincremental camera viewsconsensus partition matrix+2
9
citations
#90

Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network

Xiang Fang, Wanlong Fang, Changshuo Wang et al.

AAAI 2025
9
citations
#91

MergeBench: A Benchmark for Merging Domain-Specialized LLMs

Yifei He, Siqi Zeng, Yuzheng Hu et al.

NeurIPS 2025
8
citations
#92

Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning

Xialei Liu, Jiang-Tian Zhai, Andrew Bagdanov et al.

CVPR 2024
8
citations
#93

Incomplete Multi-view Deep Clustering with Data Imputation and Alignment

Jiyuan Liu, Xinwang Liu, Xinhang Wan et al.

NeurIPS 2025
incomplete multi-view clusteringdata imputationlatent representation alignmentmulti-modal learning+2
8
citations
#94

ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL

Yang Qin, Chao Chen, Zhihang Fu et al.

ICLR 2025
8
citations
#95

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents

Junpeng Yue, Xinrun Xu, Börje F. Karlsson et al.

ICLR 2025
8
citations
#96

Continual Multimodal Contrastive Learning

Xiaohao Liu, Xiaobo Xia, See-Kiong Ng et al.

NeurIPS 2025
8
citations
#97

MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities

Kunxi Li, Tianyu Zhan, Kairui Fu et al.

AAAI 2025
8
citations
#98

Mimic In-Context Learning for Multimodal Tasks

Yuchu Jiang, Jiale Fu, chenduo hao et al.

CVPR 2025
8
citations
#99

Activation-Informed Merging of Large Language Models

Amin Heyrani Nobari, Kaveh Alimohammadi, Ali ArjomandBigdeli et al.

NeurIPS 2025
7
citations
#100

H-ensemble: An Information Theoretic Approach to Reliable Few-Shot Multi-Source-Free Transfer

Yanru Wu, Jianning Wang, Weida Wang et al.

AAAI 2024arXiv:2312.12489
multi-source transfer learningfew-shot learningtransferability metricssource model ensemble+4
7
citations