by Jia wei Papers
4 papers found
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang, Haofeng Huang, Pengle Zhang et al.
ICML 2025poster
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
Jintao Zhang, Jia wei, Haoxu Wang et al.
NeurIPS 2025spotlight
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang, Jia wei, Pengle Zhang et al.
ICLR 2025poster
SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference
Jintao Zhang, Chendong Xiang, Haofeng Huang et al.
ICML 2025poster