NEURIPS Poster "inference acceleration" Papers
10 papers found
Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism
Kunyun Wang, Bohan Li, Kai Yu et al.
NEURIPS 2025posterarXiv:2505.14741
1
citations
dKV-Cache: The Cache for Diffusion Language Models
Xinyin Ma, Runpeng Yu, Gongfan Fang et al.
NEURIPS 2025posterarXiv:2505.15781
66
citations
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li, Fangyun Wei, Chao Zhang et al.
NEURIPS 2025posterarXiv:2503.01840
102
citations
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
Marianne Arriola, Yair Schiff, Hao Phung et al.
NEURIPS 2025posterarXiv:2510.22852
1
citations
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu, Jingyang Li, Xingyu Xie et al.
NEURIPS 2025posterarXiv:2502.11018
3
citations
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang, Bairu Hou, Wei Wei et al.
NEURIPS 2025posterarXiv:2502.16002
24
citations
Language Models Can Predict Their Own Behavior
Dhananjay Ashok, Jonathan May
NEURIPS 2025posterarXiv:2502.13329
5
citations
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu, Xiaobo Xia, Weixiang Zhao et al.
NEURIPS 2025posterarXiv:2505.17505
5
citations
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation
Yao Teng, Fu-Yun Wang, Xian Liu et al.
NEURIPS 2025posterarXiv:2510.08994
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Jialiang Kang, Han Shu, Wenshuo Li et al.
NEURIPS 2025posterarXiv:2509.15235
2
citations