"cuda kernel optimization" Papers
2 papers found
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
DEOKJAE LEE, Hyun Oh Song
NEURIPS 2025posterarXiv:2509.20214
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong, Lujun Li, Yuedong Zhong et al.
ICLR 2025posterarXiv:2408.01803
31
citations