ICLR 2025 "llm inference efficiency" Papers
3 papers found
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao (Mark) Chen, Fuwen Tan, Alexandros Kouris et al.
ICLR 2025posterarXiv:2410.13461
8
citations
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Hanlin Tang, Yang Lin, Jing Lin et al.
ICLR 2025posterarXiv:2407.15891
59
citations
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong, Lujun Li, Yuedong Zhong et al.
ICLR 2025posterarXiv:2408.01803
31
citations