NeurIPS 2025 "weight-only quantization" Papers
2 papers found
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
Gunho Park, Jeongin Bae, Byeongwook Kim et al.
NeurIPS 2025posterarXiv:2512.17970
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
DEOKJAE LEE, Hyun Oh Song
NeurIPS 2025posterarXiv:2509.20214