"transformer acceleration" Papers
2 papers found
DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference
Chong Wu, Jiawang Cao, Renjie Xu et al.
NeurIPS 2025poster
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu, Meng Chen, Baotong Lu et al.
NeurIPS 2025posterarXiv:2409.10516
83
citations