2025 Poster "inference acceleration" Papers

17 papers found

Beyond Autoregression: Fast LLMs via Self-Distillation Through Time

Justin Deschenaux, Caglar Gulcehre

ICLR 2025posterarXiv:2410.21035
25
citations

Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

Qizhe Zhang, Aosong Cheng, Ming Lu et al.

ICCV 2025posterarXiv:2412.01818
37
citations

Block Verification Accelerates Speculative Decoding

Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.

ICLR 2025posterarXiv:2403.10444
18
citations

Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference

Nadav Timor, Jonathan Mamou, Daniel Korat et al.

ICLR 2025posterarXiv:2405.14105
7
citations

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Marianne Arriola, Yair Schiff, Hao Phung et al.

NeurIPS 2025posterarXiv:2510.22852
1
citations

FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation

Tianyun Zhong, Chao Liang, Jianwen Jiang et al.

CVPR 2025posterarXiv:2412.16915
5
citations

Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond

Costin-Andrei Oncescu, Sanket Jayant Purandare, Stratos Idreos et al.

ICLR 2025posterarXiv:2410.12982
2
citations

Grouped Speculative Decoding for Autoregressive Image Generation

Junhyuk So, Juncheol Shin, Hyunho Kook et al.

ICCV 2025posterarXiv:2508.07747
3
citations

KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

Jingbo Yang, Bairu Hou, Wei Wei et al.

NeurIPS 2025posterarXiv:2502.16002
24
citations

Language Models Can Predict Their Own Behavior

Dhananjay Ashok, Jonathan May

NeurIPS 2025posterarXiv:2502.13329
5
citations

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Xiaohao Liu, Xiaobo Xia, Weixiang Zhao et al.

NeurIPS 2025posterarXiv:2505.17505
5
citations

Quantization without Tears

Minghao Fu, Hao Yu, Jie Shao et al.

CVPR 2025posterarXiv:2411.13918
14
citations

SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models

Jaerin Lee, Daniel Jung, Kanggeon Lee et al.

CVPR 2025posterarXiv:2403.09055
3
citations

Simple ReFlow: Improved Techniques for Fast Flow Models

Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.

ICLR 2025posterarXiv:2410.07815
28
citations

SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models

Muyang Li, Yujun Lin, Zhekai Zhang et al.

ICLR 2025posterarXiv:2411.05007
90
citations

TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance

Minghao Fu, Guo-Hua Wang, Xiaohao Chen et al.

ICCV 2025posterarXiv:2507.18192

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding

Jialiang Kang, Han Shu, Wenshuo Li et al.

NeurIPS 2025posterarXiv:2509.15235
2
citations