2025 Poster "llm serving optimization" Papers
2 papers found
Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving
Xinyu Wang, Jonas M. Kübler, Kailash Budhathoki et al.
NeurIPS 2025posterarXiv:2510.23346
Transcending Cost-Quality Tradeoff in Agent Serving via Session-Awareness
Yanyu Ren, Li Chen, Dan Li et al.
NeurIPS 2025poster