2025 "model compression" Papers

33 papers found

Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization

kaiyuan Li, Xiaoyue Chen, Chen Gao et al.

NeurIPS 2025posterarXiv:2505.22038
4
citations

CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.

CVPR 2025highlightarXiv:2503.05936
2
citations

Composable Interventions for Language Models

Arinbjörn Kolbeinsson, Kyle O'Brien, Tianjin Huang et al.

ICLR 2025posterarXiv:2407.06483
4
citations

Computation and Memory-Efficient Model Compression with Gradient Reweighting

Zhiwei Li, Yuesen Liao, Binrui Wu et al.

NeurIPS 2025poster

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

Yongqi Huang, Peng Ye, Chenyu Huang et al.

CVPR 2025posterarXiv:2503.01359
6
citations

Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks

Steffen Schotthöfer, Lexie Yang, Stefan Schnake

NeurIPS 2025oralarXiv:2505.08022
6
citations

EdgeTAM: On-Device Track Anything Model

Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.

CVPR 2025posterarXiv:2501.07256
8
citations

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang et al.

NeurIPS 2025posterarXiv:2506.12015

Fast Feedforward 3D Gaussian Splatting Compression

Yihang Chen, Qianyi Wu, Mengyao Li et al.

ICLR 2025posterarXiv:2410.08017
26
citations

FedWSQ: Efficient Federated Learning with Weight Standardization and Distribution-Aware Non-Uniform Quantization

Seung-Wook Kim, Seongyeol Kim, Jiah Kim et al.

ICCV 2025posterarXiv:2506.23516

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.

CVPR 2025highlightarXiv:2506.11543
5
citations

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee, Haebin Seong, Dong Bok Lee et al.

ICLR 2025posterarXiv:2410.01524
13
citations

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

Yuxian Gu, Qinghao Hu, Haocheng Xi et al.

NeurIPS 2025posterarXiv:2508.15884
15
citations

Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution

Simiao Li, Yun Zhang, Wei Li et al.

ICLR 2025posterarXiv:2404.02573
4
citations

Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation

Fei Wang, Li Shen, Liang Ding et al.

NeurIPS 2025posterarXiv:2510.15304

LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing

Ruisi Cai, Saurav Muralidharan, Hongxu Yin et al.

ICLR 2025poster
4
citations

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Yuxuan Cai, Jiangning Zhang, Haoyang He et al.

ICCV 2025posterarXiv:2410.16236
23
citations

Mixture Compressor for Mixture-of-Experts LLMs Gains More

Wei Huang, Yue Liao, Jianhui Liu et al.

ICLR 2025posterarXiv:2410.06270
22
citations

MODEL SHAPLEY: Find Your Ideal Parameter Player via One Gradient Backpropagation

Chu Xu, Xinke Jiang, Rihong Qiu et al.

NeurIPS 2025poster

MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics

Bowei Guo, Shengkun Tang, Cong Zeng et al.

ICCV 2025posterarXiv:2510.11962
1
citations

One-Shot Knowledge Transfer for Scalable Person Re-Identification

Longhua Li, Lei Qi, Xin Geng

ICCV 2025posterarXiv:2511.06016

Optimal Brain Apoptosis

Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.

ICLR 2025posterarXiv:2502.17941
3
citations

PLD: A Choice-Theoretic List-Wise Knowledge Distillation

Ejafa Bassam, Dawei Zhu, Kaigui Bian

NeurIPS 2025posterarXiv:2506.12542

Quantization without Tears

Minghao Fu, Hao Yu, Jie Shao et al.

CVPR 2025posterarXiv:2411.13918
14
citations

Quantized Spike-driven Transformer

Xuerui Qiu, Malu Zhang, Jieyuan Zhang et al.

ICLR 2025posterarXiv:2501.13492
14
citations

RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models

Zukang Xu, Xing Hu, Qiang Wu et al.

NeurIPS 2025posterarXiv:2510.01240

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Rasoul Shafipour, David Harrison, Maxwell Horton et al.

ICLR 2025posterarXiv:2410.10714
2
citations

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Xingrun Xing, Boyan Gao, Zheng Liu et al.

ICLR 2025posterarXiv:2407.04752
21
citations

Systematic Outliers in Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

ICLR 2025posterarXiv:2502.06415
15
citations

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Makoto Shing, Kou Misaki, Han Bao et al.

ICLR 2025oralarXiv:2501.16937
12
citations

The Unreasonable Ineffectiveness of the Deeper Layers

Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.

ICLR 2025posterarXiv:2403.17887
160
citations

TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks

Xiang Meng, Mehdi Makni, Rahul Mazumder

NeurIPS 2025poster

Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models

Yoojin Jung, Byung Cheol Song

CVPR 2025posterarXiv:2504.04747
1
citations