DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference

0citations
0
Citations
#1334
in NeurIPS 2025
of 5858 papers
7
Authors
4
Data Points

Abstract

This paper proposes the Dual-Stage Sparse Attention (DuSA) mechanism for attention acceleration of transformers. In the first stage, DuSA performs intrablock sparse attention to aggregate local inductive biases. In the second stage, DuSA performs interblock sparse attention to obtain long-range dependencies. Both stages have low computational complexity and can be further accelerated by memory acceleration attention mechanisms directly, which makes DuSA faster than some extremely fast attention mechanisms. The dual-stage sparse attention design provides a lower error in approximating vanilla scaled-dot product attention than the basic single-stage sparse attention mechanisms and further advances the basic sparse attention mechanisms to match or even outperform vanilla scaled-dot product attention. Even in some plug and play situations, DuSA can still maintain low performance loss. DuSA can be used in both training and inference acceleration. DuSA achieves leading performance in different benchmarks: long range arena, image classification, semantic segmentation, object detection, text to video generation, and long context understanding, and accelerates models of different sizes.

Citation History

Jan 26, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0
Feb 2, 2026
0