NEURIPS "efficient inference" Papers
4 papers found
ALTER: All-in-One Layer Pruning and Temporal Expert Routing for Efficient Diffusion Generation
Xiaomeng Yang, LEI LU, Qihui Fan et al.
NEURIPS 2025oralarXiv:2505.21817
Plug-and-Play Context Feature Reuse for Efficient Masked Generation
Xuejie Liu, Anji Liu, Guy Van den Broeck et al.
NEURIPS 2025posterarXiv:2505.19089
3
citations
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Ao Wang, Hui Chen, Jianchao Tan et al.
NEURIPS 2025posterarXiv:2412.03409
5
citations
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu, Yi Ge, Yichen You et al.
NEURIPS 2025posterarXiv:2505.21600
11
citations