NEURIPS Poster "llm inference efficiency" Papers
2 papers found
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
Gunho Park, Jeongin Bae, Byeongwook Kim et al.
NEURIPS 2025posterarXiv:2512.17970
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
Hancheng Ye, Zhengqi Gao, Mingyuan Ma et al.
NEURIPS 2025posterarXiv:2510.12872
1
citations