"latency reduction" Papers
8 papers found
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents
Qizheng Zhang, Michael Wornow, Kunle Olukotun
NeurIPS 2025posterarXiv:2506.14852
7
citations
Fourier Token Merging: Understanding and Capitalizing Frequency Domain for Efficient Image Generation
Jiesong Liu, Xipeng Shen
NeurIPS 2025poster
SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models
Jaerin Lee, Daniel Jung, Kanggeon Lee et al.
CVPR 2025posterarXiv:2403.09055
3
citations
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Rui Pan, Yinwei Dai, Zhihao Zhang et al.
NeurIPS 2025posterarXiv:2504.07891
35
citations
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong (Ryan) Wang, Zifeng Wang, Long Le et al.
ICLR 2025posterarXiv:2407.08223
75
citations
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang et al.
NeurIPS 2025posterarXiv:2505.14631
35
citations
An LLM Compiler for Parallel Function Calling
Sehoon Kim, Suhong Moon, Ryan Tabrizi et al.
ICML 2024posterarXiv:2312.04511
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis et al.
ICML 2024posterarXiv:2310.07177