Poster "llm inference efficiency" Papers
6 papers found
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
Gunho Park, Jeongin Bae, Byeongwook Kim et al.
NeurIPS 2025posterarXiv:2512.17970
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
Hancheng Ye, Zhengqi Gao, Mingyuan Ma et al.
NeurIPS 2025posterarXiv:2510.12872
1
citations
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong, Lujun Li, Yuedong Zhong et al.
ICLR 2025posterarXiv:2408.01803
31
citations
CLLMs: Consistency Large Language Models
Siqi Kou, Lanxiang Hu, Zhezhi He et al.
ICML 2024poster
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong, Xinyu Yang, Zhenyu Zhang et al.
ICML 2024poster
Online Cascade Learning for Efficient Inference over Streams
Lunyiu Nie, Zhimin Ding, Erdong Hu et al.
ICML 2024poster