ICLR 2025 "memory efficiency" Papers
3 papers found
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.
ICLR 2025posterarXiv:2410.10819
165
citations
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
ICLR 2025posterarXiv:2411.05007
90
citations
Variational Bayesian Pseudo-Coreset
Hyungi Lee, Seungyoo Lee, Juho Lee
ICLR 2025posterarXiv:2502.21143