NeurIPS Poster "model compression" Papers
10 papers found
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization
kaiyuan Li, Xiaoyue Chen, Chen Gao et al.
NeurIPS 2025posterarXiv:2505.22038
4
citations
Computation and Memory-Efficient Model Compression with Gradient Reweighting
Zhiwei Li, Yuesen Liao, Binrui Wu et al.
NeurIPS 2025poster
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang et al.
NeurIPS 2025posterarXiv:2506.12015
Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks
Julia Nakhleh, Robert Nowak
NeurIPS 2025posterarXiv:2505.21791
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu, Qinghao Hu, Haocheng Xi et al.
NeurIPS 2025posterarXiv:2508.15884
15
citations
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
Fei Wang, Li Shen, Liang Ding et al.
NeurIPS 2025posterarXiv:2510.15304
MODEL SHAPLEY: Find Your Ideal Parameter Player via One Gradient Backpropagation
Chu Xu, Xinke Jiang, Rihong Qiu et al.
NeurIPS 2025poster
PLD: A Choice-Theoretic List-Wise Knowledge Distillation
Ejafa Bassam, Dawei Zhu, Kaigui Bian
NeurIPS 2025posterarXiv:2506.12542
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
Zukang Xu, Xing Hu, Qiang Wu et al.
NeurIPS 2025posterarXiv:2510.01240
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks
Xiang Meng, Mehdi Makni, Rahul Mazumder
NeurIPS 2025poster