NeurIPS "model quantization" Papers
2 papers found
GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
Guang Liang, Xinyao Liu, Jianxin Wu
NeurIPS 2025posterarXiv:2506.11784
4
citations
Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
Hao Kang, Qingru Zhang, Han Cai et al.
NeurIPS 2025spotlightarXiv:2505.19481
4
citations