NEURIPS 2025 "inference acceleration" Papers

13 papers found

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism

Kunyun Wang, Bohan Li, Kai Yu et al.

NEURIPS 2025posterarXiv:2505.14741
1
citations

dKV-Cache: The Cache for Diffusion Language Models

Xinyin Ma, Runpeng Yu, Gongfan Fang et al.

NEURIPS 2025posterarXiv:2505.15781
66
citations

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Yuhui Li, Fangyun Wei, Chao Zhang et al.

NEURIPS 2025posterarXiv:2503.01840
102
citations

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Marianne Arriola, Yair Schiff, Hao Phung et al.

NEURIPS 2025posterarXiv:2510.22852
1
citations

GRIFFIN: Effective Token Alignment for Faster Speculative Decoding

Shijing Hu, Jingyang Li, Xingyu Xie et al.

NEURIPS 2025posterarXiv:2502.11018
3
citations

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Kele Shao, Keda TAO, Can Qin et al.

NEURIPS 2025oralarXiv:2505.21334
18
citations

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

Jingbo Yang, Bairu Hou, Wei Wei et al.

NEURIPS 2025posterarXiv:2502.16002
24
citations

Language Models Can Predict Their Own Behavior

Dhananjay Ashok, Jonathan May

NEURIPS 2025posterarXiv:2502.13329
5
citations

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Xiaohao Liu, Xiaobo Xia, Weixiang Zhao et al.

NEURIPS 2025posterarXiv:2505.17505
5
citations

Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation

Yao Teng, Fu-Yun Wang, Xian Liu et al.

NEURIPS 2025posterarXiv:2510.08994

TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup

Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.

NEURIPS 2025spotlight

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding

Jialiang Kang, Han Shu, Wenshuo Li et al.

NEURIPS 2025posterarXiv:2509.15235
2
citations

VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching

Siyu Xu, Yunke Wang, Chenghao Xia et al.

NEURIPS 2025oralarXiv:2502.02175
27
citations