🧬Architectures

Mixture of Experts

Sparse MoE architectures

100 papers5,702 total citations
Compare with other topics
Feb '24 Jan '261452 papers
Also includes: mixture of experts, moe, sparse mixture, expert routing

Top Papers

#1

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.

ECCV 2024
356
citations
#2

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupre la Tour, Henk Tillman et al.

ICLR 2025arXiv:2406.04093
sparse autoencoderslanguage model interpretabilityfeature extractionk-sparse autoencoders+4
298
citations
#3

Mixture-of-Agents Enhances Large Language Model Capabilities

Junlin Wang, Jue Wang, Ben Athiwaratkun et al.

ICLR 2025
274
citations
#4

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Samuel Marks, Can Rager, Eric Michaud et al.

ICLR 2025
252
citations
#5

RoMa: Robust Dense Feature Matching

Johan Edstedt, Qiyu Sun, Georg Bökman et al.

CVPR 2024
238
citations
#6

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Linrui Tian, Qi Wang, Bang Zhang et al.

ECCV 2024
218
citations
#7

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Yuan Zhang, Chun-Kai Fan, Junpeng Ma et al.

ICML 2025
190
citations
#8

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z Pan, Shuyi Yang et al.

NeurIPS 2025arXiv:2503.13657
multi-agent llm systemsfailure pattern analysissystem failure taxonomyllm-as-a-judge+3
188
citations
#9

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.

ICLR 2025
170
citations
#10

Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

Yifan Wang, Xingyi He, Sida Peng et al.

CVPR 2024
142
citations
#11

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

Ted Zadouri, Ahmet Üstün, Arash Ahmadian et al.

ICLR 2024
138
citations
#12

MogaNet: Multi-order Gated Aggregation Network

Siyuan Li, Zedong Wang, Zicheng Liu et al.

ICLR 2024
125
citations
#13

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Min Shi, Fuxiao Liu, Shihao Wang et al.

ICLR 2025arXiv:2408.15998
multimodal large language modelsvision encodersoptical character recognitiondocument analysis+4
116
citations
#14

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Zhengxuan Wu, Aryaman Arora, Atticus Geiger et al.

ICML 2025
100
citations
#15

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Changhun Lee, Jungyu Jin, Taesu Kim et al.

AAAI 2024arXiv:2306.02272
weight quantizationlarge language modelsmixed-precision quantizationparameter-efficient fine-tuning+4
100
citations
#16

MoBA: Mixture of Block Attention for Long-Context LLMs

Enzhe Lu, Zhejun Jiang, Jingyuan Liu et al.

NeurIPS 2025arXiv:2502.13189
attention mechanismlong-context llmsmixture of expertssparse attention+2
94
citations
#17

Consistency Models Made Easy

Zhengyang Geng, Ashwini Pokle, Weijian Luo et al.

ICLR 2025
81
citations
#18

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.

CVPR 2024
78
citations
#19

MMTEB: Massive Multilingual Text Embedding Benchmark

Kenneth Enevoldsen, Isaac Chung, Imene Kerboua et al.

ICLR 2025arXiv:2502.13595
text embedding evaluationmultilingual benchmarksinstruction following taskslong-document retrieval+4
74
citations
#20

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

Yaohua Zha, Huizhen Ji, Jinmin Li et al.

AAAI 2024arXiv:2312.10726
masked autoencoders3d representation learningpoint cloud pre-trainingtransformer encoder+4
61
citations
#21

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Zhaorun Chen, Zichen Wen, Yichao Du et al.

NeurIPS 2025arXiv:2407.04842
multimodal reward modelstext-to-image generationpreference datasetimage generation models+4
57
citations
#22

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Guangkai Xu, yongtao ge, Mingyu Liu et al.

ICLR 2025arXiv:2403.06090
diffusion modelsdense perception tasksmonocular depth estimationsurface normal estimation+4
56
citations
#23

RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents against Human Experts

Hjalmar Wijk, Tao Lin, Joel Becker et al.

ICML 2025
56
citations
#24

Inductive Moment Matching

Linqi (Alex) Zhou, Stefano Ermon, Jiaming Song

ICML 2025
54
citations
#25

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Xunhao Lai, Jianqiao Lu, Yao Luo et al.

ICLR 2025arXiv:2502.20766
attention mechanismsparse attentionlong-sequence inferencequery-aware patterns+2
51
citations
#26

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Adam Karvonen, Can Rager, Johnny Lin et al.

ICML 2025
51
citations
#27

CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

Yiming Huang, WEILIN WAN, Yue Yang et al.

ECCV 2024arXiv:2403.13900
text-to-motion generationcontrollable motion editingdiscrete pose codeslarge language models+4
48
citations
#28

OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

Han Liang, Jiacheng Bao, Ruichi Zhang et al.

CVPR 2024
47
citations
#29

TabM: Advancing tabular deep learning with parameter-efficient ensembling

Yury Gorishniy, Akim Kotelnikov, Artem Babenko

ICLR 2025
45
citations
#30

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

Zhiqian Lan, Yuxuan Jiang, Yao Mu et al.

ICLR 2024
45
citations
#31

Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification

Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.

AAAI 2024
45
citations
#32

S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data

Xuyang Li, Danfeng Hong, Jocelyn Chanussot

CVPR 2024
44
citations
#33

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Keisuke Kamahori, Tian Tang, Yile Gu et al.

ICLR 2025
43
citations
#34

Theory on Mixture-of-Experts in Continual Learning

Hongbo Li, Sen Lin, Lingjie Duan et al.

ICLR 2025arXiv:2406.16437
continual learningcatastrophic forgettingmixture-of-expertsoverparameterized linear regression+3
40
citations
#35

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NeurIPS 2025
40
citations
#36

Self-Evolving Multi-Agent Collaboration Networks for Software Development

Yue Hu, Yuzhu Cai, Yaxin Du et al.

ICLR 2025
40
citations
#37

Multi-Architecture Multi-Expert Diffusion Models

Yunsung Lee, Jin-Young Kim, Hyojun Go et al.

AAAI 2024arXiv:2306.04990
diffusion modelsmulti-expert modelsattention mechanismimage generation+3
39
citations
#38

HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs

Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh

AAAI 2025
39
citations
#39

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Yu-Hsuan Wu, Jerry Hu, Weijian Li et al.

ICLR 2024
39
citations
#40

TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts

Hyunwook Lee, Sungahn Ko

ICLR 2024
38
citations
#41

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Yang Zhou, Hao Shao, Letian Wang et al.

CVPR 2024
38
citations
#42

MoH: Multi-Head Attention as Mixture-of-Head Attention

Peng Jin, Bo Zhu, Li Yuan et al.

ICML 2025
37
citations
#43

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Yun Li, Yiming Zhang, Tao Lin et al.

ICCV 2025
36
citations
#44

NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views

Han Huang, Yulun Wu, Junsheng Zhou et al.

AAAI 2024arXiv:2312.13977
neural implicit functionsmulti-view reconstructionsparse view reconstructionsurface reconstruction+3
35
citations
#45

Towards Energy Efficient Spiking Neural Networks: An Unstructured Pruning Framework

Xinyu Shi, Jianhao Ding, Zecheng Hao et al.

ICLR 2024
35
citations
#46

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

ICLR 2025
34
citations
#47

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.

ICLR 2025
33
citations
#48

MoAI: Mixture of All Intelligence for Large Language and Vision Models

Byung-Kwan Lee, Beomchan Park, Chae Won Kim et al.

ECCV 2024arXiv:2403.07508
instruction-tuned llvmsvisual perception tasksscene graph generationoptical character recognition+4
33
citations
#49

Spurious Feature Diversification Improves Out-of-distribution Generalization

LIN Yong, Lu Tan, Yifan HAO et al.

ICLR 2024
33
citations
#50

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

Zhihao Li, Yufei Wang, Heliang Zheng et al.

NeurIPS 2025
32
citations
#51

Frequency-Adaptive Pan-Sharpening with Mixture of Experts

Xuanhua He, Keyu Yan, Rui Li et al.

AAAI 2024arXiv:2401.02151
pan-sharpeningfrequency domain processingadaptive frequency separationmixture of experts+4
32
citations
#52

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Yucheng Li, Huiqiang Jiang, Qianhui Wu et al.

ICLR 2025
32
citations
#53

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Yiming Wang, Pei Zhang, Baosong Yang et al.

ICLR 2025
32
citations
#54

RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation

Haiming Zhang, Xu Yan, Dongfeng Bai et al.

AAAI 2024arXiv:2312.11829
3d occupancy predictioncross-modal knowledge distillationmulti-view imagesvolume rendering+4
31
citations
#55

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Peng Jin, Bo Zhu, Yuan Li et al.

ICLR 2025
31
citations
#56

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views

Wangze Xu, Huachen Gao, Shihe Shen et al.

ECCV 2024
30
citations
#57

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Thomas Fel, Ekdeep Singh Lubana, Jacob Prince et al.

ICML 2025
28
citations
#58

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.

ICCV 2025arXiv:2501.04931
jailbreak attacksmultimodal large language modelssafety mechanism vulnerabilitiesshuffle inconsistency+4
28
citations
#59

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification

Yuhao Wang, Yang Liu, Aihua Zheng et al.

AAAI 2025
27
citations
#60

Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free

Ziyue Li, Tianyi Zhou

ICLR 2025arXiv:2410.10814
mixture-of-experts llmsembedding modelsexpert routersrouting weights+4
27
citations
#61

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Hyesu Lim, Jinho Choi, Jaegul Choo et al.

ICLR 2025
27
citations
#62

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

Qingming LIU, Yuan Liu, Jiepeng Wang et al.

ICLR 2025
26
citations
#63

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

Jusheng Zhang, Zimeng Huang, Yijia Fan et al.

ICML 2025
26
citations
#64

UMBRAE: Unified Multimodal Brain Decoding

Weihao Xia, Raoul de Charette, Cengiz Oztireli et al.

ECCV 2024
26
citations
#65

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Baichuan Zhou, Haote Yang, Dairong Chen et al.

AAAI 2025
26
citations
#66

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

ZUYAN LIU, Benlin Liu, Jiahui Wang et al.

ECCV 2024
25
citations
#67

Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation

Rongyu Zhang, Yulin Luo, Jiaming Liu et al.

AAAI 2024
25
citations
#68

MoDE: CLIP Data Experts via Clustering

Jiawei Ma, Po-Yao Huang, Saining Xie et al.

CVPR 2024
25
citations
#69

Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment

Harrish Thasarathan, Julian Forsyth, Thomas Fel et al.

ICML 2025
24
citations
#70

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

Rongchang Xie, Chen Du, Ping Song et al.

ICCV 2025
24
citations
#71

$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Zhongwei Wan, Xinjian Wu, Yu Zhang et al.

ICLR 2025
23
citations
#72

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Jiangjie Chen, Qianyu He, Siyu Yuan et al.

NeurIPS 2025
23
citations
#73

Mixture Compressor for Mixture-of-Experts LLMs Gains More

Wei Huang, Yue Liao, Jianhui Liu et al.

ICLR 2025
23
citations
#74

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025
23
citations
#75

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Yogesh Kumar, Pekka Marttinen

ECCV 2024
22
citations
#76

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Yuxuan Luo, Zhengkun Rong, Lizhen Wang et al.

ICCV 2025
22
citations
#77

Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for Loss-Free Multi-Exposure Image Fusion

Guanyao Wu, Hongming Fu, Jinyuan Liu et al.

AAAI 2024arXiv:2309.01113
multi-exposure image fusionneural architecture searchloss function searchhybrid supervision+4
22
citations
#78

MOFDiff: Coarse-grained Diffusion for Metal-Organic Framework Design

Xiang Fu, Tian Xie, Andrew Rosen et al.

ICLR 2024
21
citations
#79

Pathologies of Predictive Diversity in Deep Ensembles

Geoff Pleiss, Taiga Abe, E. Kelly Buchanan et al.

ICLR 2024
21
citations
#80

MC^2: Multi-concept Guidance for Customized Multi-concept Generation

Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.

CVPR 2025arXiv:2404.05268
customized text-to-image generationmulti-concept customizationinference-time optimizationattention weight refinement+3
21
citations
#81

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.

AAAI 2024arXiv:2304.10520
masked image modelingmasked autoencodersinstance discriminationcontrastive tuning+4
21
citations
#82

Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Barrett Tang, Zile Huang, Chengzhi Liu et al.

ICLR 2025
attention mechanismmultimodal large language modelshallucination reductionself-attention patterns+3
20
citations
#83

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

Junyi Chen, Longteng Guo, Jia Sun et al.

AAAI 2024arXiv:2308.11971
vision-language pre-trainingmasked signal modelingmultimodal transformermodality-aware moe+3
20
citations
#84

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

Kensen Shi, Joey Hong, Yinlin Deng et al.

ICLR 2024
20
citations
#85

GOAL: A Generalist Combinatorial Optimization Agent Learner

Darko Drakulić, Sofia Michel, Jean-Marc Andreoli

ICLR 2025
20
citations
#86

Hyper-Connections

Defa Zhu, Hongzhi Huang, Zihao Huang et al.

ICLR 2025
20
citations
#87

ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models

Yi-Lin Sung, Jaehong Yoon, Mohit Bansal

ICLR 2024
20
citations
#88

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.

ICLR 2025
19
citations
#89

Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching

Federico Errica, Henrik Christiansen, Viktor Zaverkin et al.

ICML 2025
19
citations
#90

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Junmo Kang, Leonid Karlinsky, Hongyin Luo et al.

ICLR 2025
19
citations
#91

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

Anke Tang, Enneng Yang, Li Shen et al.

NeurIPS 2025
18
citations
#92

Exploring Diverse Representations for Open Set Recognition

Yu Wang, Junxian Mu, Pengfei Zhu et al.

AAAI 2024arXiv:2401.06521
open set recognitionattention diversity regularizationmulti-expert fusiondiscriminative models+4
18
citations
#93

TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

Cheng Yang, Yang Sui, Jinqi Xiao et al.

CVPR 2025
18
citations
#94

Delta Decompression for MoE-based LLMs Compression

Hao Gu, Wei Li, Lujun Li et al.

ICML 2025
18
citations
#95

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025
18
citations
#96

Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection

Maxime Darrin, Guillaume Staerman, Eduardo Dadalto Camara Gomes et al.

AAAI 2024arXiv:2302.09852
out-of-distribution detectionanomaly score aggregationlayer-wise representationstextual ood benchmarks+3
17
citations
#97

UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

Jian Zou, Tianyu Huang, Guanglei Yang et al.

ECCV 2024
17
citations
#98

InfMAE: A Foundation Model in The Infrared Modality

Fangcen liu, Chenqiang Gao, Yaming Zhang et al.

ECCV 2024
17
citations
#99

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.

ICLR 2025
17
citations
#100

SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds

Yanbo Wang, Wentao Zhao, Cao Chuan et al.

ECCV 2024
17
citations