ICLR 2025 "inference acceleration" Papers
9 papers found
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux, Caglar Gulcehre
ICLR 2025posterarXiv:2410.21035
25
citations
Block Verification Accelerates Speculative Decoding
Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.
ICLR 2025posterarXiv:2403.10444
18
citations
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
Nadav Timor, Jonathan Mamou, Daniel Korat et al.
ICLR 2025posterarXiv:2405.14105
7
citations
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Zhengyao Lyu, Chenyang Si, Junhao Song et al.
ICLR 2025oralarXiv:2410.19355
54
citations
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
Costin-Andrei Oncescu, Sanket Jayant Purandare, Stratos Idreos et al.
ICLR 2025posterarXiv:2410.12982
2
citations
ParaSolver: A Hierarchical Parallel Integral Solver for Diffusion Models
Jianrong Lu, Zhiyu Zhu, Junhui Hou
ICLR 2025poster
4
citations
Simple ReFlow: Improved Techniques for Fast Flow Models
Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.
ICLR 2025posterarXiv:2410.07815
28
citations
SLMRec: Distilling Large Language Models into Small for Sequential Recommendation
Wujiang Xu, Qitian Wu, Zujie Liang et al.
ICLR 2025oralarXiv:2405.17890
17
citations
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
ICLR 2025posterarXiv:2411.05007
90
citations