Poster by Beomseok Kwon Papers
2 papers found
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park, baeseong park, Minsub Kim et al.
ICLR 2024posterarXiv:2206.09557
Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models
Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon et al.
ICLR 2024posterarXiv:2309.15531