"language model inference" Papers
3 papers found
Bifurcated Attention for Single-Context Large-Batch Sampling
Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda et al.
ICML 2024poster
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
Ziqian Zeng, Yihuai Hong, Hongliang Dai et al.
AAAI 2024paperarXiv:2312.11882
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo
Stephen Zhao, Rob Brekelmans, Alireza Makhzani et al.
ICML 2024poster