2025 "memory efficiency" Papers
6 papers found
Accurate KV Cache Eviction via Anchor Direction Projection for Efficient LLM Inference
Zijie Geng, Jie Wang, Ziqi Liu et al.
NeurIPS 2025poster
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.
ICCV 2025posterarXiv:2502.04469
1
citations
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.
ICLR 2025posterarXiv:2410.10819
165
citations
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
ICLR 2025posterarXiv:2411.05007
90
citations
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.
NeurIPS 2025posterarXiv:2503.14376
8
citations
Variational Bayesian Pseudo-Coreset
Hyungi Lee, Seungyoo Lee, Juho Lee
ICLR 2025posterarXiv:2502.21143