"model compression" Papers

58 papers found • Page 1 of 2

CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.

CVPR 2025highlightarXiv:2503.05936
2
citations

Computation and Memory-Efficient Model Compression with Gradient Reweighting

Zhiwei Li, Yuesen Liao, Binrui Wu et al.

NeurIPS 2025poster

DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models

Yongqi Huang, Peng Ye, Chenyu Huang et al.

CVPR 2025posterarXiv:2503.01359
6
citations

Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks

Steffen Schotthöfer, Lexie Yang, Stefan Schnake

NeurIPS 2025oralarXiv:2505.08022
6
citations

EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction

Hsi-Che Lin, Yu-Chu Yu, Kai-Po Chang et al.

NeurIPS 2025posterarXiv:2506.12015

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.

CVPR 2025highlightarXiv:2506.11543
5
citations

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee, Haebin Seong, Dong Bok Lee et al.

ICLR 2025posterarXiv:2410.01524
13
citations

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

Yuxian Gu, Qinghao Hu, Haocheng Xi et al.

NeurIPS 2025posterarXiv:2508.15884
15
citations

Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation

Fei Wang, Li Shen, Liang Ding et al.

NeurIPS 2025posterarXiv:2510.15304

LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing

Ruisi Cai, Saurav Muralidharan, Hongxu Yin et al.

ICLR 2025poster
4
citations

MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics

Bowei Guo, Shengkun Tang, Cong Zeng et al.

ICCV 2025posterarXiv:2510.11962
1
citations

Optimal Brain Apoptosis

Mingyuan Sun, Zheng Fang, Jiaxu Wang et al.

ICLR 2025posterarXiv:2502.17941
3
citations

Quantized Spike-driven Transformer

Xuerui Qiu, Malu Zhang, Jieyuan Zhang et al.

ICLR 2025posterarXiv:2501.13492
14
citations

RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models

Zukang Xu, Xing Hu, Qiang Wu et al.

NeurIPS 2025posterarXiv:2510.01240

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Rasoul Shafipour, David Harrison, Maxwell Horton et al.

ICLR 2025posterarXiv:2410.10714
2
citations

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Xingrun Xing, Boyan Gao, Zheng Liu et al.

ICLR 2025posterarXiv:2407.04752
21
citations

Systematic Outliers in Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

ICLR 2025posterarXiv:2502.06415
15
citations

The Unreasonable Ineffectiveness of the Deeper Layers

Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.

ICLR 2025posterarXiv:2403.17887
160
citations

TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks

Xiang Meng, Mehdi Makni, Rahul Mazumder

NeurIPS 2025poster

An Empirical Study of CLIP for Text-Based Person Search

Cao Min, Yang Bai, ziyin Zeng et al.

AAAI 2024paperarXiv:2308.10045
94
citations

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Yeonhong Park, Jake Hyun, SangLyul Cho et al.

ICML 2024poster

Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification

Luyang Fang, Yongkai Chen, Wenxuan Zhong et al.

ICML 2024poster

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Wei Huang, Yangdong Liu, Haotong Qin et al.

ICML 2024poster

BiPFT: Binary Pre-trained Foundation Transformer with Low-Rank Estimation of Binarization Residual Polynomials

Xingrun Xing, Li Du, Xinyuan Wang et al.

AAAI 2024paperarXiv:2312.08937

Building Variable-Sized Models via Learngene Pool

Boyu Shi, Shiyu Xia, Xu Yang et al.

AAAI 2024paperarXiv:2312.05743
5
citations

CHAI: Clustered Head Attention for Efficient LLM Inference

Saurabh Agarwal, Bilge Acun, Basil Hosmer et al.

ICML 2024poster

Compressing Large Language Models by Joint Sparsification and Quantization

Jinyang Guo, Jianyu Wu, Zining Wang et al.

ICML 2024poster

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.

ICML 2024poster

DFD: Distilling the Feature Disparity Differently for Detectors

Kang Liu, Yingyi Zhang, Jingyun Zhang et al.

ICML 2024poster

DistiLLM: Towards Streamlined Distillation for Large Language Models

Jongwoo Ko, Sungnyun Kim, Tianyi Chen et al.

ICML 2024poster

Do Topological Characteristics Help in Knowledge Distillation?

Jungeun Kim, Junwon You, Dongjin Lee et al.

ICML 2024poster

DεpS: Delayed ε-Shrinking for Faster Once-For-All Training

Aditya Annavajjala, Alind Khare, Animesh Agrawal et al.

ECCV 2024posterarXiv:2407.06167
1
citations

Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

Yixing Xu, Chao Li, Dong Li et al.

ICML 2024poster

Entropy Induced Pruning Framework for Convolutional Neural Networks

Yiheng Lu, Ziyu Guan, Yaming Yang et al.

AAAI 2024paperarXiv:2208.06660
6
citations

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Dong Chen, Ning Liu, Yichen Zhu et al.

AAAI 2024paperarXiv:2402.00084
8
citations

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Yunshan Zhong, Jiawei Hu, You Huang et al.

ICML 2024spotlight

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Wenshuo Li, Xinghao Chen, Han Shu et al.

ICML 2024poster

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024poster

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302
70
citations

Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.

ICML 2024poster

Flextron: Many-in-One Flexible Large Language Model

Ruisi Cai, Saurav Muralidharan, Greg Heinrich et al.

ICML 2024poster

Fluctuation-Based Adaptive Structured Pruning for Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

AAAI 2024paperarXiv:2312.11983
96
citations

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024poster

Generative Model-Based Feature Knowledge Distillation for Action Recognition

Guiqin Wang, Peng Zhao, Yanjiang Shi et al.

AAAI 2024paperarXiv:2312.08644
6
citations

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

Amin Parchami, Moritz Böhle, Sukrut Rao et al.

ECCV 2024posterarXiv:2402.03119
18
citations

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

Lu Yin, Ajay Jaiswal, Shiwei Liu et al.

ICML 2024poster

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Chao Li, Anbang Yao

ICML 2024poster

Lightweight Image Super-Resolution via Flexible Meta Pruning

Yulun Zhang, Kai Zhang, Luc Van Gool et al.

ICML 2024poster

Localizing Task Information for Improved Model Merging and Compression

Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez et al.

ICML 2024poster

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Changhun Lee, Jungyu Jin, Taesu Kim et al.

AAAI 2024paperarXiv:2306.02272
100
citations
← PreviousNext →