2024 "inference acceleration" Papers

17 papers found

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Liang Chen, Haozhe Zhao, Tianyu Liu et al.

ECCV 2024posterarXiv:2403.06764
343
citations

Better & Faster Large Language Models via Multi-token Prediction

Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Roziere et al.

ICML 2024poster

Data-free Distillation of Diffusion Models with Bootstrapping

Jiatao Gu, Chen Wang, Shuangfei Zhai et al.

ICML 2024poster

DiJiang: Efficient Large Language Models through Compact Kernelization

Hanting Chen, Liuzhicheng Liuzhicheng, Xutao Wang et al.

ICML 2024poster

Distilling Diffusion Models into Conditional GANs

Minguk Kang, Richard Zhang, Connelly Barnes et al.

ECCV 2024posterarXiv:2405.05967
75
citations

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Yuhui Li, Fangyun Wei, Chao Zhang et al.

ICML 2024poster

EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Yanxi Chen, Xuchen Pan, Yaliang Li et al.

ICML 2024poster

Expediting Contrastive Language-Image Pretraining via Self-Distilled Encoders

Bumsoo Kim, Jinhyung Kim, Yeonsik Jo et al.

AAAI 2024paperarXiv:2312.12659
5
citations

Fluctuation-Based Adaptive Structured Pruning for Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

AAAI 2024paperarXiv:2312.11983
96
citations

How Deep Do We Need: Accelerating Training and Inference of Neural ODEs via Control Perspective

Keyan Miao, Konstantinos Gatsis

ICML 2024oral

Online Speculative Decoding

Xiaoxuan Liu, Lanxiang Hu, Peter Bailis et al.

ICML 2024poster

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

Xiang Meng, Shibal Ibrahim, Kayhan Behdin et al.

ICML 2024poster

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Lu Yin, You Wu, Zhenyu Zhang et al.

ICML 2024poster

PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference

Tanvir Mahmud, Burhaneddin Yaman, Chun-Hao Liu et al.

ECCV 2024poster
7
citations

REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates

Arshia Afzal, Grigorios Chrysos, Volkan Cevher et al.

ICML 2024oral

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Jiwon Song, Kyungseok Oh, Taesu Kim et al.

ICML 2024poster

Switchable Decision: Dynamic Neural Generation Networks

Shujian Zhang, Korawat Tanwisuth, Chengyue Gong et al.

ICML 2024poster