ICML Poster "inference acceleration" Papers
10 papers found
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Roziere et al.
ICML 2024poster
Data-free Distillation of Diffusion Models with Bootstrapping
Jiatao Gu, Chen Wang, Shuangfei Zhai et al.
ICML 2024poster
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen, Liuzhicheng Liuzhicheng, Xutao Wang et al.
ICML 2024poster
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li, Fangyun Wei, Chao Zhang et al.
ICML 2024poster
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen, Xuchen Pan, Yaliang Li et al.
ICML 2024poster
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis et al.
ICML 2024poster
OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization
Xiang Meng, Shibal Ibrahim, Kayhan Behdin et al.
ICML 2024poster
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin, You Wu, Zhenyu Zhang et al.
ICML 2024poster
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song, Kyungseok Oh, Taesu Kim et al.
ICML 2024poster
Switchable Decision: Dynamic Neural Generation Networks
Shujian Zhang, Korawat Tanwisuth, Chengyue Gong et al.
ICML 2024poster