Model Compression
Making models smaller and faster
Top Papers
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Suyu Ge, Yunan Zhang, Liyuan Liu et al.
RECOMP: Improving Retrieval-Augmented LMs with Context Compression and Selective Augmentation
Fangyuan Xu, Weijia Shi, Eunsol Choi
HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
Yihang Chen, Qianyi Wu, Weiyao Lin et al.
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
gaojie lin, Jianwen Jiang, Jiaqi Yang et al.
Consistency Models Made Easy
Zhengyang Geng, Ashwini Pokle, Weijian Luo et al.
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han
UMA: A Family of Universal Models for Atoms
Brandon Wood, Misko Dzamba, Xiang Fu et al.
Accelerating Diffusion Transformers with Token-wise Feature Caching
Chang Zou, Xuyang Liu, Ting Liu et al.
Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching
Shitong Shao, Zeyuan Yin, Muxin Zhou et al.
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo, Yingqing He, Haoxin Chen et al.
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Rang Meng, Xingyu Zhang, Yuming Li et al.
DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation
Yifei Li, Hsiaoyu Chen, Egor Larionov et al.
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Han Shu, Wenshuo Li, Yehui Tang et al.
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
Xu Yang, Changxing Ding, Zhibin Hong et al.
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Le Zhuo, Liangbing Zhao, Sayak Paul et al.
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
Kai Wang, Mingjia Shi, YuKun Zhou et al.
Training-Free Pretrained Model Merging
Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.
Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
Anke Tang, Enneng Yang, Li Shen et al.
Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene
Jiahao Wu, Rui Peng, Zhiyan Wang et al.
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang, Vardan Papyan
Locality-aware Gaussian Compression for Fast and High-quality Rendering
Seungjoo Shin, Jaesik Park, Sunghyun Cho
Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation
Xiyi Chen, Marko Mihajlovic, Shaofei Wang et al.
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.
MambaIC: State Space Models for High-Performance Learned Image Compression
Fanhu Zeng, Hao Tang, Yihua Shao et al.
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini, Shikhar Murty, Christopher Manning et al.
Palu: KV-Cache Compression with Low-Rank Projection
Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin et al.
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
Xiang Liu, Zhenheng Tang, Peijie Dong et al.
Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs
Soonbin Lee, Fangwen Shu, Yago Sanchez de la Fuente et al.
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju, Haicheng Wang, Haozhe Cheng et al.
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
Zeman Li, Xinwei Zhang, Peilin Zhong et al.
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Bokai Lin, Zihao Zeng, Zipeng Xiao et al.
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
Tianyi Zhang, Mohsen Hariri, Shaochen (Henry) Zhong et al.
PLeaS - Merging Models with Permutations and Least Squares
Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.
PowerMLP: An Efficient Version of KAN
Ruichen Qiu, Yibo Miao, Shiwen Wang et al.
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang et al.
Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
Shikai Qiu, Lechao Xiao, Andrew Wilson et al.
HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder
Qi Yang, Le Yang, Geert Van der Auwera et al.
Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner
Mengfei Xia, Yujun Shen, Changsong Lei et al.
Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes
Ziqian Bai, Feitong Tan, Sean Fanello et al.
ModSkill: Physical Character Skill Modularization
Yiming Huang, Zhiyang Dou, Lingjie Liu
AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval
Pavel Suma, Giorgos Kordopatis-Zilos, Ahmet Iscen et al.
Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin et al.
Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data
David Heurtel-Depeiges, Anian Ruoss, Joel Veness et al.
SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models
Hung Nguyen, Quang Qui-Vinh Nguyen, Khoi Nguyen et al.
Kinetics: Rethinking Test-Time Scaling Law
Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng et al.
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
Xinghui Li, Qichao Sun, Pengze Zhang et al.
Vision-centric Token Compression in Large Language Model
Ling Xing, Alex Jinpeng Wang, Rui Yan et al.
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
Shibo Jie, Yehui Tang, Jianyuan Guo et al.
Learned Image Compression with Hierarchical Progressive Context Modeling
Yuqi Li, Haotian Zhang, Li Li et al.
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
Enshu Liu, Junyi Zhu, Zinan Lin et al.
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS
Weijie Wang, Donny Y. Chen, Zeyu Zhang et al.
Differentiable Product Quantization for Memory Efficient Camera Relocalization
Zakaria Laskar, Iaroslav Melekhov, Assia Benbihi et al.
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam, Soowon Son, Zhan Xu et al.
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Xuran Ma, Yexin Liu, Yaofu LIU et al.
StableCodec: Taming One-Step Diffusion for Extreme Image Compression
Tianyu Zhang, Xin Luo, Li Li et al.
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns
Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar et al.
Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements
Niccolò Biondi, Federico Pernici, Simone Ricci et al.
MERGE$^3$: Efficient Evolutionary Merging on Consumer-grade GPUs
Tommaso Mencattini, Adrian Robert Minut, Donato Crisostomi et al.
Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization
Vladimir Boza, Vladimir Macko
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
Liyao Jiang, Negar Hassanpour, Mohammad Salameh et al.
CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation
Kai Fang, Anqi Zhang, Guangyu Gao et al.
Make Lossy Compression Meaningful for Low-Light Images
Shilv Cai, Liqun Chen, Sheng Zhong et al.
Lightweight Predictive 3D Gaussian Splats
Junli Cao, Vidit Goel, Chaoyang Wang et al.
Unsegment Anything by Simulating Deformation
Jiahao Lu, Xingyi Yang, Xinchao Wang
We Should Chart an Atlas of All the World's Models
Eliahu Horwitz, Nitzan Kurer, Jonathan Kahana et al.
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
Boqian Li, Zeyu Cai, Michael Black et al.
Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation
Haizhong Zheng, Jiachen Sun, Shutong Wu et al.
PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling
Hao Zhang, Haolan Xu, Chun Feng et al.
Make Your Training Flexible: Towards Deployment-Efficient Video Models
Chenting Wang, Kunchang Li, Tianxiang Jiang et al.
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng, Junbiao Pang, Baochang Zhang et al.
WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression
Yu Mao, Jun Wang, Nan Guan et al.
Metamizer: A Versatile Neural Optimizer for Fast and Accurate Physics Simulations
Nils Wandel, Stefan Schulz, Reinhard Klein
Efficient and Accurate Explanation Estimation with Distribution Compression
Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl et al.
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
Shibo Jie, Yehui Tang, Kai Han et al.
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo, Chenghao Qiu, Maojiang Su et al.
LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing
Ruisi Cai, Saurav Muralidharan, Hongxu Yin et al.
Anatomically Constrained Implicit Face Models
Prashanth Chandran, Gaspard Zoss
Accelerated Methods with Compressed Communications for Distributed Optimization Problems Under Data Similarity
Dmitry Bylinkin, Aleksandr Beznosikov
Scale Efficient Training for Large Datasets
Qing Zhou, Junyu Gao, Qi Wang
CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations
Caner Korkmaz, Brighton Nuwagira, Baris Coskunuzer et al.
Scaling Down Text Encoders of Text-to-Image Diffusion Models
Lifu Wang, Daqing Liu, Xinchen Liu et al.
Compressing Streamable Free-Viewpoint Videos to 0.1 MB per Frame
Luyang Tang, Jiayu Yang, Rui Peng et al.
Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models
Alina Shutova, Vladimir Malinovskii, Vage Egiazarian et al.
A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective
Lianghe Shi, Meng Wu, Huijie Zhang et al.
ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model
Wenyu Li, Binghui Chen, Yifeng Geng et al.
Adaptive Pruning of Pretrained Transformer via Differential Inclusions
yizhuo Ding, Ke Fan, Yikai Wang et al.
RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement
Gang He, Weiran Wang, Guancheng Quan et al.
FREE-Merging: Fourier Transform for Efficient Model Merging
Shenghe Zheng, Hongzhi Wang
DMesh++: An Efficient Differentiable Mesh for Complex Shapes
Sanghyun Son, Matheus Gadelha, Yang Zhou et al.
Faster and Better 3D Splatting via Group Training
Chengbo Wang, Guozheng Ma, Yizhen Lao et al.
Improving Visual and Downstream Performance of Low-Light Enhancer with Vision Foundation Models Collaboration
yuxuan Gu, Huaian Chen, Yi Jin et al.
Adaptive Routing of Text-to-Image Generation Requests Between Large Cloud Model and Light-Weight Edge Model
Zewei Xin, Qinya Li, Chaoyue Niu et al.
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.
Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement
Hyeonjin Kim, Jaejun Yoo
DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge
Sabbir Ahmed, Abdullah Al Arafat, Deniz Najafi et al.
Test-Time Fine-Tuning of Image Compression Models for Multi-Task Adaptability
Unki Park, Seongmoon Jeong, Jang Youngchan et al.
PocketSR: The Super-Resolution Expert in Your Pocket Mobiles
Haoze Sun, Linfeng Jiang, Fan Li et al.