"communication overhead" Papers
3 papers found
Accelerating Parallel Diffusion Model Serving with Residual Compression
Jiajun Luo, Yicheng Xiao, Jianru Xu et al.
NeurIPS 2025oralarXiv:2507.17511
Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving
Xinyu Wang, Jonas M. Kübler, Kailash Budhathoki et al.
NeurIPS 2025posterarXiv:2510.23346
First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training
Gyudong Kim, Hyukju Na, Jin Kim et al.
NeurIPS 2025poster