🧬Efficiency

Model Compression

Making models smaller and faster

100 papers2,062 total citations

Compare with other topics

Feb '24 — Jan '26327 papers

Top Conferences

CVPR: 26 ICLR: 21 ICCV: 16 NeurIPS: 11 ECCV: 10 AAAI: 9

Top Papers

#1

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Suyu Ge, Yunan Zhang, Liyuan Liu et al.

RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation

Fangyuan Xu, Weijia Shi, Eunsol Choi

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

Yihang Chen, Qianyi Wu, Weiyao Lin et al.

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

gaojie lin, Jianwen Jiang, Jiaqi Yang et al.

Consistency Models Made Easy

Zhengyang Geng, Ashwini Pokle, Weijian Luo et al.

Model Stock: All we need is just a few fine-tuned models

Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

UMA: A Family of Universal Models for Atoms

Brandon Wood, Misko Dzamba, Xiang Fu et al.

NeurIPS 2025arXiv:2506.23971

atomic simulationsmaterials sciencemixture of linear expertsempirical scaling laws+4

62

citations

#8

Accelerating Diffusion Transformers with Token-wise Feature Caching

Chang Zou, Xuyang Liu, Ting Liu et al.

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

Shitong Shao, Zeyuan Yin, Muxin Zhou et al.

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Lanqing Guo, Yingqing He, Haoxin Chen et al.

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Rang Meng, Xingyu Zhang, Yuming Li et al.

DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation

Yifei Li, Hsiaoyu Chen, Egor Larionov et al.

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Han Shu, Wenshuo Li, Yehui Tang et al.

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

Xu Yang, Changxing Ding, Zhibin Hong et al.

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Le Zhuo, Liangbing Zhao, Sayak Paul et al.

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Kai Wang, Mingjia Shi, YuKun Zhou et al.

Training-Free Pretrained Model Merging

Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

Anke Tang, Enneng Yang, Li Shen et al.

Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene

Jiahao Wu, Rui Peng, Zhiyan Wang et al.

OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition

Stephen Zhang, Vardan Papyan

Locality-aware Gaussian Compression for Fast and High-quality Rendering

Seungjoo Shin, Jaesik Park, Sunghyun Cho

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Xiyi Chen, Marko Mihajlovic, Shaofei Wang et al.

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.

MambaIC: State Space Models for High-Performance Learned Image Compression

Fanhu Zeng, Hao Tang, Yihua Shao et al.

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Julie Kallini, Shikhar Murty, Christopher Manning et al.

Palu: KV-Cache Compression with Low-Rank Projection

Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin et al.

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference

Xiang Liu, Zhenheng Tang, Peijie Dong et al.

Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs

Soonbin Lee, Fangwen Shu, Yago Sanchez de la Fuente et al.

ICCV 2025arXiv:2501.03399

3d gaussian splatting3d scene representationfeature planesprogressive tri-plane structure+4

13

citations

#31

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

Chen Ju, Haicheng Wang, Haozhe Cheng et al.

Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models

Zeman Li, Xinwei Zhang, Peilin Zhong et al.

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

Bokai Lin, Zihao Zeng, Zipeng Xiao et al.

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)

Tianyi Zhang, Mohsen Hariri, Shaochen (Henry) Zhong et al.

PLeaS - Merging Models with Permutations and Least Squares

Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.

CVPR 2025arXiv:2407.02447

model mergingpermutation symmetriesleast squares approximationfine-tuned models+3

10

citations

#36

PowerMLP: An Efficient Version of KAN

Ruichen Qiu, Yibo Miao, Shiwen Wang et al.

You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning

Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty

MagCache: Fast Video Generation with Magnitude-Aware Cache

Zehong Ma, Longhui Wei, Feng Wang et al.

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Shikai Qiu, Lechao Xiao, Andrew Wilson et al.

HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder

Qi Yang, Le Yang, Geert Van der Auwera et al.

Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner

Mengfei Xia, Yujun Shen, Changsong Lei et al.

Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

Ziqian Bai, Feitong Tan, Sean Fanello et al.

ModSkill: Physical Character Skill Modularization

Yiming Huang, Zhiyang Dou, Lingjie Liu

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval

Pavel Suma, Giorgos Kordopatis-Zilos, Ahmet Iscen et al.

Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin et al.

Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data

David Heurtel-Depeiges, Anian Ruoss, Joel Veness et al.

SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models

Hung Nguyen, Quang Qui-Vinh Nguyen, Khoi Nguyen et al.

Kinetics: Rethinking Test-Time Scaling Law

Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng et al.

NeurIPS 2025arXiv:2506.05333

test-time scalingmemory access bottleneckssparse attentioninference-time strategies+3

7

citations

#49

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Xinghui Li, Qichao Sun, Pengze Zhang et al.

Vision-centric Token Compression in Large Language Model

Ling Xing, Alex Jinpeng Wang, Rui Yan et al.

NeurIPS 2025arXiv:2502.00791

token compressionvision-language modelscontext window expansionin-context learning+4

7

citations

#51

Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning

Shibo Jie, Yehui Tang, Jianyuan Guo et al.

Learned Image Compression with Hierarchical Progressive Context Modeling

Yuqi Li, Haotian Zhang, Li Li et al.

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

Enshu Liu, Junyi Zhu, Zinan Lin et al.

ICLR 2025arXiv:2404.02241

diffusion modelsconsistency modelscheckpoint averaginggenerative models+3

6

citations

#54

ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS

Weijie Wang, Donny Y. Chen, Zeyu Zhang et al.

Differentiable Product Quantization for Memory Efficient Camera Relocalization

Zakaria Laskar, Iaroslav Melekhov, Assia Benbihi et al.

Visual Persona: Foundation Model for Full-Body Human Customization

Jisu Nam, Soowon Son, Zhan Xu et al.

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Xuran Ma, Yexin Liu, Yaofu LIU et al.

StableCodec: Taming One-Step Diffusion for Extreme Image Compression

Tianyu Zhang, Xin Luo, Li Li et al.

A Unified Model for Compressed Sensing MRI Across Undersampling Patterns

Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar et al.

Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements

Niccolò Biondi, Federico Pernici, Simone Ricci et al.

MERGE$^3$: Efficient Evolutionary Merging on Consumer-grade GPUs

Tommaso Mencattini, Adrian Robert Minut, Donato Crisostomi et al.

Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization

Vladimir Boza, Vladimir Macko

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

Liyao Jiang, Negar Hassanpour, Mohammad Salameh et al.

CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation

Kai Fang, Anqi Zhang, Guangyu Gao et al.

Make Lossy Compression Meaningful for Low-Light Images

Shilv Cai, Liqun Chen, Sheng Zhong et al.

AAAI 2024arXiv:2305.15030

lossy image compressionlow-light image enhancementend-to-end architecturesignal-to-noise ratio aware+2

5

citations

#66

Lightweight Predictive 3D Gaussian Splats

Junli Cao, Vidit Goel, Chaoyang Wang et al.

ICLR 2025arXiv:2406.19434

3d gaussian splatsscene representationlightweight renderingpoint cloud compression+3

5

citations

#67

Unsegment Anything by Simulating Deformation

Jiahao Lu, Xingyi Yang, Xinchao Wang

We Should Chart an Atlas of All the World's Models

Eliahu Horwitz, Nitzan Kurer, Jonathan Kahana et al.

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

Boqian Li, Zeyu Cai, Michael Black et al.

ICCV 2025arXiv:2503.10624

body fittingclothed humansequivariant tightnesspoint cloud processing+4

5

citations

#70

Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation

Haizhong Zheng, Jiachen Sun, Shutong Wu et al.

PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling

Hao Zhang, Haolan Xu, Chun Feng et al.

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Chenting Wang, Kunchang Li, Tianxiang Jiang et al.

Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

Jiaxin Deng, Junbiao Pang, Baochang Zhang et al.

WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression

Yu Mao, Jun Wang, Nan Guan et al.

Metamizer: A Versatile Neural Optimizer for Fast and Accurate Physics Simulations

Nils Wandel, Stefan Schulz, Reinhard Klein

Efficient and Accurate Explanation Estimation with Distribution Compression

Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl et al.

SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs

Shibo Jie, Yehui Tang, Kai Han et al.

Fast and Low-Cost Genomic Foundation Models via Outlier Removal

Haozheng Luo, Chenghao Qiu, Maojiang Su et al.

LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing

Ruisi Cai, Saurav Muralidharan, Hongxu Yin et al.

ICLR 2025

structured pruningknowledge distillationweight sharingzero-shot pruning+4

4

citations

#80

Anatomically Constrained Implicit Face Models

Prashanth Chandran, Gaspard Zoss

Accelerated Methods with Compressed Communications for Distributed Optimization Problems Under Data Similarity

Dmitry Bylinkin, Aleksandr Beznosikov

Scale Efficient Training for Large Datasets

Qing Zhou, Junyu Gao, Qi Wang

CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations

Caner Korkmaz, Brighton Nuwagira, Baris Coskunuzer et al.

Scaling Down Text Encoders of Text-to-Image Diffusion Models

Lifu Wang, Daqing Liu, Xinchen Liu et al.

Compressing Streamable Free-Viewpoint Videos to 0.1 MB per Frame

Luyang Tang, Jiayu Yang, Rui Peng et al.

Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

Alina Shutova, Vladimir Malinovskii, Vage Egiazarian et al.

A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective

Lianghe Shi, Meng Wu, Huijie Zhang et al.

ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model

Wenyu Li, Binghui Chen, Yifeng Geng et al.

Adaptive Pruning of Pretrained Transformer via Differential Inclusions

yizhuo Ding, Ke Fan, Yikai Wang et al.

RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement

Gang He, Weiran Wang, Guancheng Quan et al.

FREE-Merging: Fourier Transform for Efficient Model Merging

Shenghe Zheng, Hongzhi Wang

DMesh++: An Efficient Differentiable Mesh for Complex Shapes

Sanghyun Son, Matheus Gadelha, Yang Zhou et al.

ICCV 2025arXiv:2412.16776

differentiable mesh processing3d triangular meshesmesh connectivityshape reconstruction+3

3

citations

#93

Faster and Better 3D Splatting via Group Training

Chengbo Wang, Guozheng Ma, Yizhen Lao et al.

ICCV 2025arXiv:2412.07608

3d gaussian splattingnovel view synthesisscene reconstructiontraining efficiency+2

3

citations

#94

Improving Visual and Downstream Performance of Low-Light Enhancer with Vision Foundation Models Collaboration

yuxuan Gu, Huaian Chen, Yi Jin et al.

Adaptive Routing of Text-to-Image Generation Requests Between Large Cloud Model and Light-Weight Edge Model

Zewei Xin, Qinya Li, Chaoyue Niu et al.

Trade-offs in Image Generation: How Do Different Dimensions Interact?

Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.

Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement

Hyeonjin Kim, Jaejun Yoo

DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge

Sabbir Ahmed, Abdullah Al Arafat, Deniz Najafi et al.

Test-Time Fine-Tuning of Image Compression Models for Multi-Task Adaptability

Unki Park, Seongmoon Jeong, Jang Youngchan et al.

PocketSR: The Super-Resolution Expert in Your Pocket Mobiles

Haoze Sun, Linfeng Jiang, Fan Li et al.

NeurIPS 2025

2

citations

Model Compression

Top Conferences

Related Topics (Efficiency)

Top Papers

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Consistency Models Made Easy

Model Stock: All we need is just a few fine-tuned models

UMA: A Family of Universal Models for Atoms

Accelerating Diffusion Transformers with Token-wise Feature Caching

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Training-Free Pretrained Model Merging

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene

OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition

Locality-aware Gaussian Compression for Fast and High-quality Rendering

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

Active Data Curation Effectively Distills Large-Scale Multimodal Models

MambaIC: State Space Models for High-Performance Learned Image Compression

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Palu: KV-Cache Compression with Low-Rank Projection

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference

Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)

PLeaS - Merging Models with Permutations and Least Squares

PowerMLP: An Efficient Version of KAN

You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning

MagCache: Fast Video Generation with Magnitude-Aware Cache

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder

Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner

Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

ModSkill: Physical Character Skill Modularization

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval

Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data

SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models

Kinetics: Rethinking Test-Time Scaling Law

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Vision-centric Token Compression in Large Language Model

Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning

Learned Image Compression with Hierarchical Progressive Context Modeling

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS

Differentiable Product Quantization for Memory Efficient Camera Relocalization

Visual Persona: Foundation Model for Full-Body Human Customization

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

StableCodec: Taming One-Step Diffusion for Extreme Image Compression

A Unified Model for Compressed Sensing MRI Across Undersampling Patterns

Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements

MERGE$^3$: Efficient Evolutionary Merging on Consumer-grade GPUs

Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation

Make Lossy Compression Meaningful for Low-Light Images

Lightweight Predictive 3D Gaussian Splats

Unsegment Anything by Simulating Deformation

We Should Chart an Atlas of All the World's Models

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation

PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression

Metamizer: A Versatile Neural Optimizer for Fast and Accurate Physics Simulations

Efficient and Accurate Explanation Estimation with Distribution Compression