"large language model inference" Papers
2 papers found
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Jinwei Yao, Kaiqi Chen, Kexun Zhang et al.
ICLR 2025posterarXiv:2404.00242
8
citations
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity
Susav Shrestha, Bradley Settlemyer, Nikoli Dryden et al.
NeurIPS 2025posterarXiv:2505.14884
3
citations