NeurIPS "low-bit quantization" Papers
5 papers found
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
Gunho Park, Jeongin Bae, Byeongwook Kim et al.
NeurIPS 2025posterarXiv:2512.17970
GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
Guang Liang, Xinyao Liu, Jianxin Wu
NeurIPS 2025posterarXiv:2506.11784
4
citations
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai, Yuma Ichikawa
NeurIPS 2025posterarXiv:2504.09629
7
citations
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
Zukang Xu, Xing Hu, Qiang Wu et al.
NeurIPS 2025posterarXiv:2510.01240
VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers
Qinkai XU, yijin liu, YangChen et al.
NeurIPS 2025oral