ICML 2024 "memory efficiency" Papers
4 papers found
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.
ICML 2024poster
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski et al.
ICML 2024poster
Memory Efficient Neural Processes via Constant Memory Attention Block
Leo Feng, Frederick Tung, Hossein Hajimirsadeghi et al.
ICML 2024poster
REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates
Arshia Afzal, Grigorios Chrysos, Volkan Cevher et al.
ICML 2024oral