Poster "high throughput inference" Papers
2 papers found
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen et al.
ICLR 2025posterarXiv:2408.11049
61
citations
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity
Susav Shrestha, Bradley Settlemyer, Nikoli Dryden et al.
NeurIPS 2025posterarXiv:2505.14884
3
citations