NEURIPS 2025 "inference acceleration" Papers
13 papers found
Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism
Kunyun Wang, Bohan Li, Kai Yu et al.
NEURIPS 2025posterarXiv:2505.14741
1
citations
dKV-Cache: The Cache for Diffusion Language Models
Xinyin Ma, Runpeng Yu, Gongfan Fang et al.
NEURIPS 2025posterarXiv:2505.15781
66
citations
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li, Fangyun Wei, Chao Zhang et al.
NEURIPS 2025posterarXiv:2503.01840
102
citations
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
Marianne Arriola, Yair Schiff, Hao Phung et al.
NEURIPS 2025posterarXiv:2510.22852
1
citations
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu, Jingyang Li, Xingyu Xie et al.
NEURIPS 2025posterarXiv:2502.11018
3
citations
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao, Keda TAO, Can Qin et al.
NEURIPS 2025oralarXiv:2505.21334
18
citations
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang, Bairu Hou, Wei Wei et al.
NEURIPS 2025posterarXiv:2502.16002
24
citations
Language Models Can Predict Their Own Behavior
Dhananjay Ashok, Jonathan May
NEURIPS 2025posterarXiv:2502.13329
5
citations
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu, Xiaobo Xia, Weixiang Zhao et al.
NEURIPS 2025posterarXiv:2505.17505
5
citations
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation
Yao Teng, Fu-Yun Wang, Xian Liu et al.
NEURIPS 2025posterarXiv:2510.08994
TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup
Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.
NEURIPS 2025spotlight
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Jialiang Kang, Han Shu, Wenshuo Li et al.
NEURIPS 2025posterarXiv:2509.15235
2
citations
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
Siyu Xu, Yunke Wang, Chenghao Xia et al.
NEURIPS 2025oralarXiv:2502.02175
27
citations