🧬Efficiency

Knowledge Distillation

Transferring knowledge to smaller models

240 papers(showing top 100)778 total citations
Compare with other topics
Mar '24 β€” Feb '26203 papers
Also includes: knowledge distillation, distillation, teacher-student, model distillation

Top Papers

#1

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024arXiv:2307.12732
78
citations
#2

Towards Continual Knowledge Graph Embedding via Incremental Distillation

Jiajun Liu, Ke Wenjun, Peng Wang et al.

AAAI 2024arXiv:2405.04453
knowledge graph embeddingcontinual learningincremental distillationcatastrophic forgetting+4
39
citations
#3

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

ICLR 2025arXiv:2408.15881
knowledge distillationmixture of expertsmultimodal language modelspreference optimization+3
34
citations
#4

Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification

Kunlun Xu, Xu Zou, Yuxin Peng et al.

CVPR 2024
27
citations
#5

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wenda Xu, Rujun Han, Zifeng Wang et al.

ICLR 2025
27
citations
#6

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.

ICCV 2025arXiv:2507.02939
27
citations
#7

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

CVPR 2024
24
citations
#8

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

Xiao Cui, Mo Zhu, Yulei Qin et al.

AAAI 2025arXiv:2412.14528
22
citations
#9

Unlocking Dataset Distillation with Diffusion Models

Brian Moser, Federico Raue, Sebastian Palacio et al.

NeurIPS 2025
21
citations
#10

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Jianqing Zhang, Yang Liu, Yang Hua et al.

CVPR 2024arXiv:2403.15760
20
citations
#11

Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

Marco Mistretta, Alberto Baldrati, Marco Bertini et al.

ECCV 2024arXiv:2407.03056
20
citations
#12

Embarrassingly Simple Dataset Distillation

Yunzhen Feng, Shanmukha Ramakrishna Vedantam, Julia Kempe

ICLR 2024
20
citations
#13

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

Amin Parchami, Moritz BΓΆhle, Sukrut Rao et al.

ECCV 2024arXiv:2402.03119
knowledge distillationexplainable aimodel compressionfeature alignment+2
18
citations
#14

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

Yuzheng Wang, Dingkang Yang, Zhaoyu Chen et al.

CVPR 2024arXiv:2403.19539
17
citations
#15

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Yu Wang, Xin Li, Shengzhao Wen et al.

CVPR 2024arXiv:2211.08071
16
citations
#16

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Ming Zhong, Chenxin An, Weizhu Chen et al.

ICLR 2024arXiv:2310.11451
16
citations
#17

MiniPLM: Knowledge Distillation for Pre-training Language Models

Yuxian Gu, Hao Zhou, Fandong Meng et al.

ICLR 2025
16
citations
#18

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen et al.

ECCV 2024arXiv:2403.09296
vision-language modelscontinual learningknowledge distillationcatastrophic forgetting+3
16
citations
#19

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Sungmin Cha, Sungjun Cho, Dasol Hwang et al.

ICLR 2025arXiv:2408.06621
knowledge unlearninggradient ascentlow-rank adaptationinverted hinge loss+4
15
citations
#20

Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching

Benjamin Minixhofer, Ivan Vulić, Edoardo Maria Ponti

NeurIPS 2025arXiv:2503.20083
14
citations
#21

A Good Learner can Teach Better: Teacher-Student Collaborative Knowledge Distillation

Ayan Sengupta, Shantanu Dixit, Md Shad Akhtar et al.

ICLR 2024
14
citations
#22

Large Language Model Meets Graph Neural Network in Knowledge Distillation

Shengxiang Hu, Guobing Zou, Song Yang et al.

AAAI 2025arXiv:2402.05894
14
citations
#23

Fine-Grained Knowledge Selection and Restoration for Non-exemplar Class Incremental Learning

Authors: Jiang-Tian Zhai, Xialei Liu, Lu Yu et al.

AAAI 2024arXiv:2312.12722
class incremental learningknowledge distillationcatastrophic forgettingfine-grained selection+4
13
citations
#24

Distilling Reliable Knowledge for Instance-Dependent Partial Label Learning

Dong-Dong Wu, Deng-Bao Wang, Min-Ling Zhang

AAAI 2024
13
citations
#25

Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching

Ruonan Yu, Songhua Liu, Jingwen Ye et al.

ECCV 2024arXiv:2410.07579
13
citations
#26

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Zhiyu Zhao, Bingkun Huang, Sen Xing et al.

CVPR 2024arXiv:2311.03149
12
citations
#27

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

Chuanguang Yang, XinQiang Yu, Han Yang et al.

AAAI 2025arXiv:2502.18510
12
citations
#28

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Zheng Hu, Zhe Li, Ziyun Jiao et al.

AAAI 2025arXiv:2412.13544
11
citations
#29

Knowledge-Aware Parameter Coaching for Personalized Federated Learning

Mingjian Zhi, Yuanguo Bi, Wenchao Xu et al.

AAAI 2024
11
citations
#30

LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

Sanmin Kim, Youngseok Kim, Sihwan Hwang et al.

ECCV 2024
11
citations
#31

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Minki Kang, Jongwon Jeong, Seanie Lee et al.

NeurIPS 2025arXiv:2505.17612
11
citations
#32

How to Train the Teacher Model for Effective Knowledge Distillation

Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan et al.

ECCV 2024
11
citations
#33

Active Object Detection with Knowledge Aggregation and Distillation from Large Models

Dejie Yang, Yang Liu

CVPR 2024arXiv:2405.12509
9
citations
#34

MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities

Kunxi Li, Tianyu Zhan, Kairui Fu et al.

AAAI 2025arXiv:2404.13322
8
citations
#35

Improving Knowledge Distillation via Regularizing Feature Direction and Norm

Yuzhu Wang, Lechao Cheng, Manni Duan et al.

ECCV 2024
8
citations
#36

Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation

YUE XU, Yong-Lu Li, Kaitong Cui et al.

ECCV 2024arXiv:2305.18381
dataset distillationdata pruningcausal effectsdata efficiency+3
8
citations
#37

Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge Distillation

Hyunjune Shin, Dong-Wan Choi

AAAI 2024arXiv:2402.12406
knowledge distillationdata-free learninggenerative adversarial networksteacher-agnostic distillation+3
7
citations
#38

Improving Language Model Distillation through Hidden State Matching

Sayantan Dasgupta, Trevor Cohn

ICLR 2025
knowledge distillationhidden state matchingcentered kernel alignmentlanguage model compression+2
7
citations
#39

Generative Model-Based Feature Knowledge Distillation for Action Recognition

Guiqin Wang, Peng Zhao, Yanjiang Shi et al.

AAAI 2024arXiv:2312.08644
knowledge distillationaction recognitiongenerative modelsfeature semantics+4
6
citations
#40

Boosting Residual Networks with Group Knowledge

Shengji Tang, Peng Ye, Baopu Li et al.

AAAI 2024arXiv:2308.13772
residual networksimplicit ensemble modelstochastic depthknowledge distillation+4
6
citations
#41

Graph-Based Cross-Domain Knowledge Distillation for Cross-Dataset Text-to-Image Person Retrieval

Bingjun Luo, Jinpeng Wang, Zewen Wang et al.

AAAI 2025arXiv:2501.15052
5
citations
#42

CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models

Dongfang Li, Zetian Sun, Xinshuo Hu et al.

AAAI 2025arXiv:2412.07393
5
citations
#43

Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion

Cunhang Fan, Yujie Chen, Jun Xue et al.

AAAI 2024arXiv:2401.12997
knowledge graph completionpre-trained language modelsprogressive distillationmasked generation features+3
5
citations
#44

Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model

Jincheng Zhong, XiangCheng Zhang, Jianmin Wang et al.

ICLR 2025
4
citations
#45

A General Theoretical Framework for Learning Smallest Interpretable Models

Sebastian Ordyniak, Giacomo Paesani, Mateusz Banany et al.

AAAI 2024
4
citations
#46

Building Optimal Neural Architectures using Interpretable Knowledge

Keith Mills, Fred Han, Mohammad Salameh et al.

CVPR 2024arXiv:2403.13293
4
citations
#47

DCSF-KD: Dynamic Channel-wise Spatial Feature Knowledge Distillation for Object Detection

Tao Dai, Yang Lin, Hang Guo et al.

AAAI 2025
4
citations
#48

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

Yunshuai Zhou, Junbo Qiao, Jincheng Liao et al.

AAAI 2025arXiv:2412.08939
4
citations
#49

Knowledge Distillation with Refined Logits

Wujie Sun, Defang Chen, Siwei Lyu et al.

ICCV 2025
4
citations
#50

Gatekeeper: Improving Model Cascades Through Confidence Tuning

Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha et al.

NeurIPS 2025arXiv:2502.19335
model cascadesconfidence calibrationdeferral mechanismscomputational efficiency+4
4
citations
#51

AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation

Zihao Tang, Zheqi Lv, Shengyu Zhang et al.

ICLR 2024arXiv:2403.07030
4
citations
#52

Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation

Chengming Hu, Haolun Wu, Xuan Li et al.

ICLR 2024arXiv:2312.15112
3
citations
#53

What Makes a Good Dataset for Knowledge Distillation?

Logan Frank, Jim Davis

CVPR 2025arXiv:2411.12817
3
citations
#54

Data-to-Model Distillation: Data-Efficient Learning Framework

Ahmad Sajedi, Samir Khaki, Lucy Z. Liu et al.

ECCV 2024arXiv:2411.12841
dataset distillationsynthetic data generationgenerative model alignmentrepresentation learning+4
3
citations
#55

EA-KD: Entropy-based Adaptive Knowledge Distillation

Chi-Ping Su, Ching-Hsun Tseng, Bin Pu et al.

ICCV 2025arXiv:2311.13621
3
citations
#56

AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution

Yun Zhang, Wei Li, Simiao Li et al.

ICLR 2025
3
citations
#57

Enhancing Generalized Few-Shot Semantic Segmentation via Effective Knowledge Transfer

Xinyue Chen, Miaojing Shi, Zijian Zhou et al.

AAAI 2025arXiv:2412.15835
3
citations
#58

From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question-Answering

Nathaniel Weir, Bhavana Dalvi Mishra, Orion Weller et al.

ICLR 2025arXiv:2412.17701
language model interpretabilityknowledge distillationgrounded question answeringmicrotheory extraction+3
3
citations
#59

Query-based Knowledge Transfer for Heterogeneous Learning Environments

Norah Alballa, Wenxuan Zhang, Ziquan Liu et al.

ICLR 2025arXiv:2504.09205
knowledge transferheterogeneous learning environmentsfederated learningdecentralized collaborative learning+4
2
citations
#60

Towards Understanding How Knowledge Evolves in Large Vision-Language Models

Sudong Wang, Yunjian Zhang, Yao Zhu et al.

CVPR 2025arXiv:2504.02862
2
citations
#61

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

Xinran Gu, Kaifeng Lyu, Jiazheng Li et al.

NeurIPS 2025arXiv:2505.18091
data mixingknowledge acquisitionphase transitionscapacity allocation+4
2
citations
#62

When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration

Quan Shi, Carlos Jimenez, Shunyu Yao et al.

NeurIPS 2025
2
citations
#63

Harmonizing knowledge Transfer in Neural Network with Unified Distillation

yaomin huang, faming Fang, Zaoming Yan et al.

ECCV 2024arXiv:2409.18565
knowledge distillationfeature aggregationunified distillation frameworkintermediate layer features+3
1
citations
#64

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

Yichen Li, Xiuying Wang, Wenchao Xu et al.

NeurIPS 2025
1
citations
#65

Cooperative Knowledge Distillation: A Learner Agnostic Approach

Michael Livanos, Ian Davidson, Stephen Wong

AAAI 2024arXiv:2402.05942
knowledge distillationcounterfactual instance generationmulti-model cooperationtransfer learning+4
1
citations
#66

Neural Collapse Inspired Knowledge Distillation

Shuoxi Zhang, Zijian Song, Kun He

AAAI 2025arXiv:2412.11788
1
citations
#67

Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models

Shengzhe Zhou, Zejian Li, Shengyuan Zhang et al.

AAAI 2024arXiv:2311.03830
diffusion model distillationspatial fitting errorattention guidancesemantic gradient predictor+3
1
citations
#68

Hybrid Data-Free Knowledge Distillation

Jialiang Tang, Shuo Chen, Chen Gong

AAAI 2025arXiv:2412.13525
1
citations
#69

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Sheng-Feng Yu, Jia-Jiun Yao, Wei-Chen Chiu

ICLR 2025arXiv:2507.21455
dataset distillationself-supervised learningcross-architecture generalizationparameterization techniques+3
1
citations
#70

DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks

Saman Forouzandeh, Parham Moradi Dowlatabadi, Mahdi Jalili

ICLR 2025
knowledge distillationhypergraph neural networksgraph convolutional networksteacher-student models+3
1
citations
#71

KDAT: Inherent Adversarial Robustness via Knowledge Distillation with Adversarial Tuning for Object Detection Models

Yarin Yerushalmi Levi, Edita Grolman, Idan Yankelev et al.

AAAI 2025
1
citations
#72

SelKD: Selective Knowledge Distillation via Optimal Transport Perspective

Liangliang Shi, Zhengyan Shi, Junchi Yan

ICLR 2025
1
citations
#73

Logit Standardization in Knowledge Distillation

Shangquan Sun, Wenqi Ren, Jingzhi Li et al.

CVPR 2024arXiv:2403.01427
β€”
not collected
#74

CrossKD: Cross-Head Knowledge Distillation for Object Detection

JiaBao Wang, yuming chen, Zhaohui Zheng et al.

CVPR 2024
β€”
not collected
#75

Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation

Keonhee Han, Dominik Muhle, Felix Wimbauer et al.

CVPR 2024arXiv:2404.07933
β€”
not collected
#76

Harnessing Language Model for Cross-Heterogeneity Graph Knowledge Transfer

Jinyu Yang, Ruijia Wang, Cheng Yang et al.

AAAI 2025
β€”
not collected
#77

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

Geonho Bang, Kwangjin Choi, Jisong Kim et al.

CVPR 2024arXiv:2403.05061
β€”
not collected
#78

Co-Progression Knowledge Distillation with Knowledge Prototype for Industrial Anomaly Detection

Bokang Yang, Zhe Zhang, Jie Ma

AAAI 2025
β€”
not collected
#79

Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance

Dong Chen, Yueting Zhuang, Shuo Zhang et al.

AAAI 2024
β€”
not collected
#80

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

CVPR 2024arXiv:2312.03052
β€”
not collected
#81

Maintaining Fairness in Logit-based Knowledge Distillation for Class-Incremental Learning

Zijian Gao, Shanhao Han, Xingxing Zhang et al.

AAAI 2025
β€”
not collected
#82

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Zeyuan Allen-Zhu, Yuanzhi Li

ICLR 2025arXiv:2404.05405
knowledge capacity scalinglanguage model scaling lawsfactual knowledge representationmodel quantization effects+4
β€”
not collected
#83

BLADE: Enhancing Black-Box Large Language Models with Small Domain-Specific Models

Haitao Li, Qingyao Ai, Jia Chen et al.

AAAI 2025
β€”
not collected
#84

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

Huanxuan Liao, Shizhu He, Yao Xu et al.

AAAI 2025
β€”
not collected
#85

Small Scale Data-Free Knowledge Distillation

He Liu, Yikai Wang, Huaping Liu et al.

CVPR 2024arXiv:2406.07876
β€”
not collected
#86

Progressively Knowledge Distillation via Re-parameterizing Diffusion Reverse Process

Xufeng Yao, Fanbin Lu, Yuechen Zhang et al.

AAAI 2024
β€”
not collected
#87

Understanding the Role of the Projector in Knowledge Distillation

AAAI 2024arXiv:2303.11098
β€”
not collected
#88

Heuristic-free Knowledge Distillation for Streaming ASR via Multi-modal Training

Ji Won Yoon

AAAI 2025
β€”
not collected
#89

Out of Thin Air: Exploring Data-Free Adversarial Robustness Distillation

Yuzheng Wang, Zhaoyu Chen, Dingkang Yang et al.

AAAI 2024arXiv:2303.11611
adversarial robustness distillationdata-free learningknowledge transferadversarial training+3
β€”
not collected
#90

Low-Rank Knowledge Decomposition for Medical Foundation Models

Yuhang Zhou, Haolin li, Siyuan Du et al.

CVPR 2024
β€”
not collected
#91

Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale

Alaa Khaddaj, Logan Engstrom, Aleksander Madry

ICLR 2025
β€”
not collected
#92

A Knowledge Distillation-Based Approach to Enhance Transparency of Classifier Models

Yuchen Jiang, Xinyuan Zhao, Yihang Wu et al.

AAAI 2025arXiv:2502.15959
β€”
not collected
#93

Complementary Knowledge Distillation for Robust and Privacy-Preserving Model Serving in Vertical Federated Learning

Dashan Gao, Sheng Wan, Lixin Fan et al.

AAAI 2024
β€”
not collected
#94

Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models

Chenhui Hu, Pengfei Cao, Yubo Chen et al.

AAAI 2025
β€”
not collected
#95

Spatial-Temporal Knowledge Distillation for Takeaway Recommendation

Shuyuan Zhao, Wei Chen, Boyan Shi et al.

AAAI 2025arXiv:2412.16502
β€”
not collected
#96

Self-Training Based Few-Shot Node Classification by Knowledge Distillation

Zongqian Wu, Yujie Mo, Peng Zhou et al.

AAAI 2024
β€”
not collected
#97

D^4: Dataset Distillation via Disentangled Diffusion Model

Duo Su, Junjie Hou, Weizhi Gao et al.

CVPR 2024
β€”
not collected
#98

Overcoming Generic Knowledge Loss with Selective Parameter Update

Wenxuan Zhang, Paul Janson, Rahaf Aljundi et al.

CVPR 2024arXiv:2308.12462
β€”
not collected
#99

Adaptive Dual Guidance Knowledge Distillation

Tong Li, Long Liu, Kang Liu et al.

AAAI 2025
β€”
not collected
#100

Real-Time Neural Denoising with Render-Aware Knowledge Distillation

Mengxun Kong, Jie Guo, Chen Wang et al.

AAAI 2025
β€”
not collected