2024 "inference optimization" Papers
2 papers found
BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
Lancheng Zou, Wenqian Zhao, Shuo Yin et al.
ICML 2024poster
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Zhongzhi Yu, Zheng Wang, Yonggan Fu et al.
ICML 2024poster