2025 "inference acceleration" Papers
17 papers found
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux, Caglar Gulcehre
ICLR 2025posterarXiv:2410.21035
25
citations
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
Nadav Timor, Jonathan Mamou, Daniel Korat et al.
ICLR 2025posterarXiv:2405.14105
7
citations
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
Marianne Arriola, Yair Schiff, Hao Phung et al.
NeurIPS 2025posterarXiv:2510.22852
1
citations
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Tianyun Zhong, Chao Liang, Jianwen Jiang et al.
CVPR 2025posterarXiv:2412.16915
5
citations
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
Costin-Andrei Oncescu, Sanket Jayant Purandare, Stratos Idreos et al.
ICLR 2025posterarXiv:2410.12982
2
citations
Grouped Speculative Decoding for Autoregressive Image Generation
Junhyuk So, Juncheol Shin, Hyunho Kook et al.
ICCV 2025posterarXiv:2508.07747
3
citations
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao, Keda TAO, Can Qin et al.
NeurIPS 2025oralarXiv:2505.21334
18
citations
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang, Bairu Hou, Wei Wei et al.
NeurIPS 2025posterarXiv:2502.16002
24
citations
Language Models Can Predict Their Own Behavior
Dhananjay Ashok, Jonathan May
NeurIPS 2025posterarXiv:2502.13329
5
citations
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu, Xiaobo Xia, Weixiang Zhao et al.
NeurIPS 2025posterarXiv:2505.17505
5
citations
Quantization without Tears
Minghao Fu, Hao Yu, Jie Shao et al.
CVPR 2025posterarXiv:2411.13918
14
citations
SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models
Jaerin Lee, Daniel Jung, Kanggeon Lee et al.
CVPR 2025posterarXiv:2403.09055
3
citations
Simple ReFlow: Improved Techniques for Fast Flow Models
Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.
ICLR 2025posterarXiv:2410.07815
28
citations
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
ICLR 2025posterarXiv:2411.05007
90
citations
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
Minghao Fu, Guo-Hua Wang, Xiaohao Chen et al.
ICCV 2025posterarXiv:2507.18192
TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup
Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.
NeurIPS 2025spotlight
ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
Jialiang Kang, Han Shu, Wenshuo Li et al.
NeurIPS 2025posterarXiv:2509.15235
2
citations