Poster "kv cache compression" Papers
7 papers found
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Kunjun Li, Zigeng Chen, Cheng-Yen Yang et al.
NeurIPS 2025posterarXiv:2505.19602
6
citations
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li, Luyuan Zhang, Zedong Wang et al.
CVPR 2025posterarXiv:2504.00999
6
citations
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.
ICLR 2025posterarXiv:2404.15574
140
citations
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang, Xiang Liu, Qian Wang et al.
ICLR 2025posterarXiv:2502.17535
10
citations
CaM: Cache Merging for Memory-efficient LLMs Inference
Yuxin Zhang, Yuxuan Du, Gen Luo et al.
ICML 2024poster
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong, Xinyu Yang, Zhenyu Zhang et al.
ICML 2024poster
LoCoCo: Dropping In Convolutions for Long Context Compression
Ruisi Cai, Yuandong Tian, Zhangyang “Atlas” Wang et al.
ICML 2024poster