Heejun Lee
3
Papers
13
Total Citations
Papers (3)
A Training-Free Sub-quadratic Cost Transformer Model Serving Framework with Hierarchically Pruned Attention
ICLR 2025arXiv
8
citations
Training Free Exponential Context Extension via Cascading KV Cache
ICLR 2025
3
citations
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction
NeurIPS 2025
2
citations