🧬Efficiency

Knowledge Distillation

Transferring knowledge to smaller models

100 papers2,744 total citations
Compare with other topics
Feb '24 Jan '26659 papers
Also includes: knowledge distillation, distillation, teacher-student, model distillation

Top Papers

#1

Sequential Modeling Enables Scalable Learning for Large Vision Models

Yutong Bai, Xinyang Geng, Karttikeya Mangalam et al.

CVPR 2024
230
citations
#2

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

Zhenyu Li, Sunqi Fan, Yu Gu et al.

AAAI 2024arXiv:2308.12060
knowledge base question answeringfew-shot learningsparql query generationprogram translation+4
122
citations
#3

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.

AAAI 2024arXiv:2311.13314
knowledge graph integrationlarge language model hallucinationfactual knowledge retrievalautonomous knowledge verification+2
108
citations
#4

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Xiang Chen, Jinshan Pan, Jiangxin Dong

CVPR 2024
83
citations
#5

Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution

Zhiyuan You, Xin Cai, Jinjin Gu et al.

CVPR 2025
81
citations
#6

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024
78
citations
#7

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.

ICLR 2024
78
citations
#8

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.

ICLR 2025arXiv:2411.14257
sparse autoencodershallucination mechanismsentity recognitionknowledge awareness+3
77
citations
#9

Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Micah Goldblum, Marc Finzi, Keefer Rowan et al.

ICML 2024
no free lunch theoremskolmogorov complexityinductive biasessupervised learning+4
60
citations
#10

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

AAAI 2024arXiv:2308.13198
knowledge neuronsmultilingual language modelsfactual knowledge storageintegrated gradients method+4
59
citations
#11

Inductive Moment Matching

Linqi (Alex) Zhou, Stefano Ermon, Jiaming Song

ICML 2025
54
citations
#12

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Lanqing Guo, Yingqing He, Haoxin Chen et al.

ECCV 2024
50
citations
#13

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu, Andrew Liu, Zining Zhu et al.

ICLR 2024
47
citations
#14

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

Jie Ren, Yaxin Li, Shenglai Zeng et al.

ECCV 2024
46
citations
#15

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Blake Bordelon, Lorenzo Noci, Mufan Li et al.

ICLR 2024
45
citations
#16

Data Shapley in One Training Run

Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.

ICLR 2025arXiv:2406.11011
data attributiondata shapleyfoundation model pretraininggenerative ai copyright+3
44
citations
#17

Towards Continual Knowledge Graph Embedding via Incremental Distillation

Jiajun Liu, Ke Wenjun, Peng Wang et al.

AAAI 2024arXiv:2405.04453
knowledge graph embeddingcontinual learningincremental distillationcatastrophic forgetting+4
39
citations
#18

XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Pritam Sarkar, Ali Etemad

AAAI 2024arXiv:2211.13929
cross-modal knowledge distillationmasked data reconstructiondomain alignment strategyvideo representation learning+4
38
citations
#19

Synthetic continued pretraining

Zitong Yang, Neil Band, Shuangping Li et al.

ICLR 2025
37
citations
#20

Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models

Shuang Li, Jiangjie Chen, Siyu Yuan et al.

AAAI 2024arXiv:2308.13961
idiomatic translationknowledge base constructiontransformer-based systemscontext-aware retrieval+3
35
citations
#21

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

ICLR 2025
34
citations
#22

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.

ICML 2025
33
citations
#23

Distilling Autoregressive Models to Obtain High-Performance Non-autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

Yubin Xiao, Di Wang, Boyang Li et al.

AAAI 2024arXiv:2312.12469
knowledge distillationautoregressive modelsnon-autoregressive modelsvehicle routing problems+2
31
citations
#24

Dataset Distillation by Automatic Training Trajectories

Dai Liu, Jindong Gu, Hu Cao et al.

ECCV 2024
29
citations
#25

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

CVPR 2025
28
citations
#26

Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification

Kunlun Xu, Xu Zou, Yuxin Peng et al.

CVPR 2024
27
citations
#27

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.

ICCV 2025
27
citations
#28

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wenda Xu, Rujun Han, Zifeng Wang et al.

ICLR 2025
27
citations
#29

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

Jusheng Zhang, Zimeng Huang, Yijia Fan et al.

ICML 2025
26
citations
#30

eTag: Class-Incremental Learning via Embedding Distillation and Task-Oriented Generation

Libo Huang, Yan Zeng, Chuanguang Yang et al.

AAAI 2024
26
citations
#31

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

Belinda Mo, Kyssen Yu, Joshua Kazdan et al.

NeurIPS 2025arXiv:2502.09956
knowledge graph extractionfoundation modelstext-to-kg generationentity clustering+2
25
citations
#32

DTL: Disentangled Transfer Learning for Visual Recognition

Minghao Fu, Ke Zhu, Jianxin Wu

AAAI 2024arXiv:2312.07856
parameter-efficient transfer learningvisual recognitiongpu memory reductiondisentangled representation learning+4
25
citations
#33

Training-Free Pretrained Model Merging

Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.

CVPR 2024
24
citations
#34

Specialized Foundation Models Struggle to Beat Supervised Baselines

Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.

ICLR 2025
24
citations
#35

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

CVPR 2024
24
citations
#36

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection

Dongmei Zhang, Chang Li, Renrui Zhang et al.

AAAI 2024arXiv:2312.14465
open-vocabulary 3d detectioncross-modal knowledge blendingfoundation modelsgrounded-segment-anything+4
22
citations
#37

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.

ICLR 2025arXiv:2406.16257
machine unlearningexact unlearningparameter-efficient fine-tuningparameter isolation+4
22
citations
#38

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

Xiao Cui, Mo Zhu, Yulei Qin et al.

AAAI 2025
22
citations
#39

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Yang Zhou, Xu Gao, Zichong Chen et al.

CVPR 2025
21
citations
#40

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan

ICLR 2025
21
citations
#41

Unlocking Dataset Distillation with Diffusion Models

Brian Moser, Federico Raue, Sebastian Palacio et al.

NeurIPS 2025
21
citations
#42

Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

Marco Mistretta, Alberto Baldrati, Marco Bertini et al.

ECCV 2024
20
citations
#43

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Jianqing Zhang, Yang Liu, Yang Hua et al.

CVPR 2024
20
citations
#44

Embarrassingly Simple Dataset Distillation

Yunzhen Feng, Shanmukha Ramakrishna Vedantam, Julia Kempe

ICLR 2024
20
citations
#45

To Grok or not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets

Darshil Doshi, Aritra Das, Tianyu He et al.

ICLR 2024
19
citations
#46

You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

Mehdi Noroozi, Isma Hadji, Brais Martinez et al.

ECCV 2024
19
citations
#47

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

Anke Tang, Enneng Yang, Li Shen et al.

NeurIPS 2025
18
citations
#48

UNIC: Universal Classification Models via Multi-teacher Distillation

Yannis Kalantidis, Larlus Diane, Mert Bulent SARIYILDIZ et al.

ECCV 2024arXiv:2408.05088
multi-teacher distillationuniversal classification modelsknowledge distillationencoder learning+3
18
citations
#49

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Eric Xue, Yijiang Li, Haoyang Liu et al.

AAAI 2025
18
citations
#50

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

Amin Parchami, Moritz Böhle, Sukrut Rao et al.

ECCV 2024arXiv:2402.03119
knowledge distillationexplainable aimodel compressionfeature alignment+2
18
citations
#51

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

Yuzheng Wang, Dingkang Yang, Zhaoyu Chen et al.

CVPR 2024
17
citations
#52

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Ming Zhong, Chenxin An, Weizhu Chen et al.

ICLR 2024
16
citations
#53

Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning

Kai Jiang, Zhengyan Shi, Dell Zhang et al.

NeurIPS 2025
16
citations
#54

MiniPLM: Knowledge Distillation for Pre-training Language Models

Yuxian Gu, Hao Zhou, Fandong Meng et al.

ICLR 2025
16
citations
#55

Mirage: Model-agnostic Graph Distillation for Graph Classification

Mridul Gupta, Sahil Manchanda, HARIPRASAD KODAMANA et al.

ICLR 2024
16
citations
#56

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen et al.

ECCV 2024arXiv:2403.09296
vision-language modelscontinual learningknowledge distillationcatastrophic forgetting+3
16
citations
#57

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Yu Wang, Xin Li, Shengzhao Wen et al.

CVPR 2024
16
citations
#58

History Matters: Temporal Knowledge Editing in Large Language Model

Xunjian Yin, Jin Jiang, Liming Yang et al.

AAAI 2024arXiv:2312.05497
temporal knowledge editinglarge language modelsknowledge updatingcatastrophic forgetting+4
15
citations
#59

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Sungmin Cha, Sungjun Cho, Dasol Hwang et al.

ICLR 2025
15
citations
#60

Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity

Jiachen Jiang, Jinxin Zhou, Zhihui Zhu

ICLR 2025
15
citations
#61

Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching

Benjamin Minixhofer, Ivan Vulić, Edoardo Maria Ponti

NeurIPS 2025
14
citations
#62

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.

CVPR 2025
14
citations
#63

A Good Learner can Teach Better: Teacher-Student Collaborative Knowledge Distillation

Ayan Sengupta, Shantanu Dixit, Md Shad Akhtar et al.

ICLR 2024
14
citations
#64

Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering

Yifan Lu, Yigeng Zhou, Jing Li et al.

AAAI 2025
14
citations
#65

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.

CVPR 2024
14
citations
#66

Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information Networks

Yankai Chen, Yixiang Fang, Qiongyan Wang et al.

AAAI 2024arXiv:2402.12411
node importance estimationheterogeneous information networksgraph neural modelsstructural knowledge exploitation+3
14
citations
#67

AMD: Automatic Multi-step Distillation of Large-scale Vision Models

Cheng Han, Qifan Wang, Sohail A Dianat et al.

ECCV 2024arXiv:2407.04208
model distillationvision model compressiontransformer-based architecturesknowledge distillation+4
14
citations
#68

Large Language Model Meets Graph Neural Network in Knowledge Distillation

Shengxiang Hu, Guobing Zou, Song Yang et al.

AAAI 2025
14
citations
#69

Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction

Zhixuan Chu, Mengxuan Hu, Qing Cui et al.

AAAI 2024arXiv:2312.16113
causal feature attributionrisk predictionclass imbalancecausal reasoning+3
13
citations
#70

Distilling Reliable Knowledge for Instance-Dependent Partial Label Learning

Dong-Dong Wu, Deng-Bao Wang, Min-Ling Zhang

AAAI 2024
13
citations
#71

Fine-Grained Knowledge Selection and Restoration for Non-exemplar Class Incremental Learning

Authors: Jiang-Tian Zhai, Xialei Liu, Lu Yu et al.

AAAI 2024arXiv:2312.12722
class incremental learningknowledge distillationcatastrophic forgettingfine-grained selection+4
13
citations
#72

Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching

Ruonan Yu, Songhua Liu, Jingwen Ye et al.

ECCV 2024
13
citations
#73

The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models

Naveen George, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu et al.

CVPR 2025
12
citations
#74

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Zhiyu Zhao, Bingkun Huang, Sen Xing et al.

CVPR 2024
12
citations
#75

TabDPT: Scaling Tabular Foundation Models on Real Data

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.

NeurIPS 2025
12
citations
#76

Cost-efficient Collaboration between On-device and Cloud Language Models

Avanika Narayan, Dan Biderman, Sabri Eyuboglu et al.

ICML 2025
12
citations
#77

Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth

Zimin Xia, Yujiao Shi, HONGDONG LI et al.

ECCV 2024arXiv:2406.00474
cross-view localizationweakly supervised learningknowledge self-distillationpseudo ground truth+3
12
citations
#78

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

Chuanguang Yang, XinQiang Yu, Han Yang et al.

AAAI 2025
12
citations
#79

Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning

Tianci Liu, Ruirui Li, Yunzhe Qi et al.

ICLR 2025arXiv:2503.00306
knowledge editinglarge language modelsrepresentation fine-tuningediting-locality trade-off+3
12
citations
#80

Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation

Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao et al.

ECCV 2024
12
citations
#81

Minimum-Norm Interpolation Under Covariate Shift

Neil Mallinar, Austin Zane, Spencer Frei et al.

ICML 2024
transfer learningcovariate shiftbenign overfittinglinear interpolation+3
12
citations
#82

Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs

Haowen Pan, Xiaozhi Wang, Yixin Cao et al.

ICLR 2025
11
citations
#83

PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor

Jaewon Jung, Hongsun Jang, Jaeyong Song et al.

CVPR 2024
11
citations
#84

KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.

AAAI 2025
11
citations
#85

How to Train the Teacher Model for Effective Knowledge Distillation

Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan et al.

ECCV 2024
11
citations
#86

Knowledge-Aware Parameter Coaching for Personalized Federated Learning

Mingjian Zhi, Yuanguo Bi, Wenchao Xu et al.

AAAI 2024
11
citations
#87

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

Haiwen Diao, Bo Wan, XU JIA et al.

ECCV 2024arXiv:2407.07523
parameter-efficient transfer learningmemory-efficient transfer learningvision-and-language taskslanguage-only tasks+3
11
citations
#88

Knowledge Localization: Mission Not Accomplished? Enter Query Localization!

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

ICLR 2025
11
citations
#89

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Zheng Hu, Zhe Li, Ziyun Jiao et al.

AAAI 2025
11
citations
#90

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Minki Kang, Jongwon Jeong, Seanie Lee et al.

NeurIPS 2025
11
citations
#91

LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

Sanmin Kim, Youngseok Kim, Sihwan Hwang et al.

ECCV 2024
11
citations
#92

Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks

Kairong Yu, Chengting Yu, Tianqing Zhang et al.

CVPR 2025arXiv:2503.03144
knowledge distillationspiking neural networkstemporal separationentropy regularization+3
10
citations
#93

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

Bokai Lin, Zihao Zeng, Zipeng Xiao et al.

ICLR 2025
10
citations
#94

Let All Be Whitened: Multi-Teacher Distillation for Efficient Visual Retrieval

Zhe Ma, Jianfeng Dong, Shouling Ji et al.

AAAI 2024arXiv:2312.09716
visual retrievalmulti-teacher distillationknowledge distillationmodel efficiency+3
10
citations
#95

GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost

Xinyi Shang, Peng Sun, Tao Lin

ICLR 2025
9
citations
#96

Quantized Prompt for Efficient Generalization of Vision-Language Models

Tianxiang Hao, Xiaohan Ding, Juexiao Feng et al.

ECCV 2024arXiv:2407.10704
vision-language modelsprompt tuningquantization errorcatastrophic forgetting+4
9
citations
#97

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Shikai Qiu, Lechao Xiao, Andrew Wilson et al.

ICML 2025
9
citations
#98

Active Object Detection with Knowledge Aggregation and Distillation from Large Models

Dejie Yang, Yang Liu

CVPR 2024
9
citations
#99

Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation

Wei Cong, Yang Cong, Yuyang Liu et al.

ECCV 2024arXiv:2407.09047
incremental semantic segmentationcatastrophic forgettingprototype-guided learningpseudo labeling+2
9
citations
#100

Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners

Bowen Shi, XIAOPENG ZHANG, Yaoming Wang et al.

ICLR 2024
9
citations