Poster "llm decoding" Papers
2 papers found
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.
ICLR 2025posterarXiv:2410.10819
165
citations
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs
Minh Nguyen, Andrew Baker, Clement Neo et al.
ICLR 2025posterarXiv:2407.01082
82
citations