NEURIPS "long-context inference" Papers
3 papers found
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
Xiang Liu, Zhenheng Tang, Peijie Dong et al.
NEURIPS 2025posterarXiv:2502.00299
15
citations
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park, Dalton Jones, Matthew Morse et al.
NEURIPS 2025posterarXiv:2504.15364
11
citations
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu, Meng Chen, Baotong Lu et al.
NEURIPS 2025posterarXiv:2409.10516
83
citations