2024 "multi-head self-attention" Papers
3 papers found
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
guangyan li, Yongqiang Tang, Wensheng Zhang
ICML 2024posterarXiv:2404.09695
SNP: Structured Neuron-level Pruning to Preserve Attention Scores
Kyunghwan Shim, Jaewoong Yun, Shinkook Choi
ECCV 2024posterarXiv:2404.11630
2
citations
Vision Transformers as Probabilistic Expansion from Learngene
Qiufeng Wang, Xu Yang, Haokun Chen et al.
ICML 2024poster