Knowledge Distillation

AAAI 2024arXiv:2405.04453

#2

Towards Continual Knowledge Graph Embedding via Incremental Distillation

Jiajun Liu, Ke Wenjun, Peng Wang et al.

knowledge graph embeddingcontinual learningincremental distillationcatastrophic forgetting+4

39

ICLR 2025arXiv:2408.15881

#3

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

knowledge distillationmixture of expertsmultimodal language modelspreference optimization+3

34

ICCV 2025arXiv:2507.02939

#4

Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification

Kunlun Xu, Xu Zou, Yuxin Peng et al.

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wenda Xu, Rujun Han, Zifeng Wang et al.

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.

27

AAAI 2025arXiv:2412.14528

#7

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

Xiao Cui, Mo Zhu, Yulei Qin et al.

22

CVPR 2024arXiv:2403.15760

#9

Unlocking Dataset Distillation with Diffusion Models

Brian Moser, Federico Raue, Sebastian Palacio et al.

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Jianqing Zhang, Yang Liu, Yang Hua et al.

20

ECCV 2024arXiv:2407.03056

#11

Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

Marco Mistretta, Alberto Baldrati, Marco Bertini et al.

20

ECCV 2024arXiv:2402.03119

#12

Embarrassingly Simple Dataset Distillation

Yunzhen Feng, Shanmukha Ramakrishna Vedantam, Julia Kempe

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

Amin Parchami, Moritz Böhle, Sukrut Rao et al.

knowledge distillationexplainable aimodel compressionfeature alignment+2

18

CVPR 2024arXiv:2403.19539

#14

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

Yuzheng Wang, Dingkang Yang, Zhaoyu Chen et al.

17

CVPR 2024arXiv:2211.08071

#15

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Yu Wang, Xin Li, Shengzhao Wen et al.

16

ICLR 2024arXiv:2310.11451

#16

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Ming Zhong, Chenxin An, Weizhu Chen et al.

16

ECCV 2024arXiv:2403.09296

#17

MiniPLM: Knowledge Distillation for Pre-training Language Models

Yuxian Gu, Hao Zhou, Fandong Meng et al.

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen et al.

vision-language modelscontinual learningknowledge distillationcatastrophic forgetting+3

16

ICLR 2025arXiv:2408.06621

#19

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Sungmin Cha, Sungjun Cho, Dasol Hwang et al.

knowledge unlearninggradient ascentlow-rank adaptationinverted hinge loss+4

15

NeurIPS 2025arXiv:2503.20083

#20

Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching

Benjamin Minixhofer, Ivan Vulić, Edoardo Maria Ponti

14

AAAI 2025arXiv:2402.05894

#21

A Good Learner can Teach Better: Teacher-Student Collaborative Knowledge Distillation

Ayan Sengupta, Shantanu Dixit, Md Shad Akhtar et al.

Large Language Model Meets Graph Neural Network in Knowledge Distillation

Shengxiang Hu, Guobing Zou, Song Yang et al.

14

AAAI 2024arXiv:2312.12722

#23

Fine-Grained Knowledge Selection and Restoration for Non-exemplar Class Incremental Learning

Authors: Jiang-Tian Zhai, Xialei Liu, Lu Yu et al.

class incremental learningknowledge distillationcatastrophic forgettingfine-grained selection+4

13

ECCV 2024arXiv:2410.07579

#24

Distilling Reliable Knowledge for Instance-Dependent Partial Label Learning

Dong-Dong Wu, Deng-Bao Wang, Min-Ling Zhang

Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching

Ruonan Yu, Songhua Liu, Jingwen Ye et al.

13

CVPR 2024arXiv:2311.03149

#26

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Zhiyu Zhao, Bingkun Huang, Sen Xing et al.

12

AAAI 2025arXiv:2502.18510

#27

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

Chuanguang Yang, XinQiang Yu, Han Yang et al.

12

AAAI 2025arXiv:2412.13544

#28

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Zheng Hu, Zhe Li, Ziyun Jiao et al.

11

NeurIPS 2025arXiv:2505.17612

#29

Knowledge-Aware Parameter Coaching for Personalized Federated Learning

Mingjian Zhi, Yuanguo Bi, Wenchao Xu et al.

LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

Sanmin Kim, Youngseok Kim, Sihwan Hwang et al.

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Minki Kang, Jongwon Jeong, Seanie Lee et al.

11

CVPR 2024arXiv:2405.12509

#32

How to Train the Teacher Model for Effective Knowledge Distillation

Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan et al.

Active Object Detection with Knowledge Aggregation and Distillation from Large Models

Dejie Yang, Yang Liu

9

AAAI 2025arXiv:2404.13322

#34

MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities

Kunxi Li, Tianyu Zhan, Kairui Fu et al.

8

ECCV 2024arXiv:2305.18381

#35

Improving Knowledge Distillation via Regularizing Feature Direction and Norm

Yuzhu Wang, Lechao Cheng, Manni Duan et al.

Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation

YUE XU, Yong-Lu Li, Kaitong Cui et al.

dataset distillationdata pruningcausal effectsdata efficiency+3

8

AAAI 2024arXiv:2402.12406

#37

Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge Distillation

Hyunjune Shin, Dong-Wan Choi

knowledge distillationdata-free learninggenerative adversarial networksteacher-agnostic distillation+3

7

knowledge distillationhidden state matchingcentered kernel alignmentlanguage model compression+2

#38

Improving Language Model Distillation through Hidden State Matching

Sayantan Dasgupta, Trevor Cohn

ICLR 2025

7

AAAI 2024arXiv:2312.08644

#39

Generative Model-Based Feature Knowledge Distillation for Action Recognition

Guiqin Wang, Peng Zhao, Yanjiang Shi et al.

knowledge distillationaction recognitiongenerative modelsfeature semantics+4

6

AAAI 2024arXiv:2308.13772

#40

Boosting Residual Networks with Group Knowledge

Shengji Tang, Peng Ye, Baopu Li et al.

residual networksimplicit ensemble modelstochastic depthknowledge distillation+4

6

AAAI 2025arXiv:2501.15052

#41

Graph-Based Cross-Domain Knowledge Distillation for Cross-Dataset Text-to-Image Person Retrieval

Bingjun Luo, Jinpeng Wang, Zewen Wang et al.

5

AAAI 2025arXiv:2412.07393

#42

CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models

Dongfang Li, Zetian Sun, Xinshuo Hu et al.

5

AAAI 2024arXiv:2401.12997

#43

Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion

Cunhang Fan, Yujie Chen, Jun Xue et al.

knowledge graph completionpre-trained language modelsprogressive distillationmasked generation features+3

5

CVPR 2024arXiv:2403.13293

#44

Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model

Jincheng Zhong, XiangCheng Zhang, Jianmin Wang et al.

A General Theoretical Framework for Learning Smallest Interpretable Models

Sebastian Ordyniak, Giacomo Paesani, Mateusz Banany et al.

Building Optimal Neural Architectures using Interpretable Knowledge

Keith Mills, Fred Han, Mohammad Salameh et al.

AAAI 2025arXiv:2412.08939

#47

DCSF-KD: Dynamic Channel-wise Spatial Feature Knowledge Distillation for Object Detection

Tao Dai, Yang Lin, Hang Guo et al.

Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration

Yunshuai Zhou, Junbo Qiao, Jincheng Liao et al.

NeurIPS 2025arXiv:2502.19335

#49

Knowledge Distillation with Refined Logits

Wujie Sun, Defang Chen, Siwei Lyu et al.

Gatekeeper: Improving Model Cascades Through Confidence Tuning

Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha et al.

model cascadesconfidence calibrationdeferral mechanismscomputational efficiency+4

ICLR 2024arXiv:2403.07030

#51

AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation

Zihao Tang, Zheqi Lv, Shengyu Zhang et al.

ICLR 2024arXiv:2312.15112

#52

Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation

Chengming Hu, Haolun Wu, Xuan Li et al.

CVPR 2025arXiv:2411.12817

#53

What Makes a Good Dataset for Knowledge Distillation?

Logan Frank, Jim Davis

ECCV 2024arXiv:2411.12841

#54

Data-to-Model Distillation: Data-Efficient Learning Framework

Ahmad Sajedi, Samir Khaki, Lucy Z. Liu et al.

dataset distillationsynthetic data generationgenerative model alignmentrepresentation learning+4

ICCV 2025arXiv:2311.13621

#55

EA-KD: Entropy-based Adaptive Knowledge Distillation

Chi-Ping Su, Ching-Hsun Tseng, Bin Pu et al.

AAAI 2025arXiv:2412.15835

#56

AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution

Yun Zhang, Wei Li, Simiao Li et al.

Enhancing Generalized Few-Shot Semantic Segmentation via Effective Knowledge Transfer

Xinyue Chen, Miaojing Shi, Zijian Zhou et al.

ICLR 2025arXiv:2412.17701

#58

From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question-Answering

Nathaniel Weir, Bhavana Dalvi Mishra, Orion Weller et al.

language model interpretabilityknowledge distillationgrounded question answeringmicrotheory extraction+3

ICLR 2025arXiv:2504.09205

#59

Query-based Knowledge Transfer for Heterogeneous Learning Environments

Norah Alballa, Wenxuan Zhang, Ziquan Liu et al.

knowledge transferheterogeneous learning environmentsfederated learningdecentralized collaborative learning+4

2

CVPR 2025arXiv:2504.02862

#60

Towards Understanding How Knowledge Evolves in Large Vision-Language Models

Sudong Wang, Yunjian Zhang, Yao Zhu et al.

2

NeurIPS 2025arXiv:2505.18091

#61

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

Xinran Gu, Kaifeng Lyu, Jiazheng Li et al.

data mixingknowledge acquisitionphase transitionscapacity allocation+4

2

ECCV 2024arXiv:2409.18565

#62

When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration

Quan Shi, Carlos Jimenez, Shunyu Yao et al.

Harmonizing knowledge Transfer in Neural Network with Unified Distillation

yaomin huang, faming Fang, Zaoming Yan et al.

knowledge distillationfeature aggregationunified distillation frameworkintermediate layer features+3

AAAI 2024arXiv:2402.05942

#64

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

Yichen Li, Xiuying Wang, Wenchao Xu et al.

Cooperative Knowledge Distillation: A Learner Agnostic Approach

Michael Livanos, Ian Davidson, Stephen Wong

knowledge distillationcounterfactual instance generationmulti-model cooperationtransfer learning+4

AAAI 2025arXiv:2412.11788

#66

Neural Collapse Inspired Knowledge Distillation

Shuoxi Zhang, Zijian Song, Kun He

AAAI 2024arXiv:2311.03830

#67

Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models

Shengzhe Zhou, Zejian Li, Shengyuan Zhang et al.

diffusion model distillationspatial fitting errorattention guidancesemantic gradient predictor+3

AAAI 2025arXiv:2412.13525

#68

Hybrid Data-Free Knowledge Distillation

Jialiang Tang, Shuo Chen, Chen Gong

ICLR 2025arXiv:2507.21455

#69

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Sheng-Feng Yu, Jia-Jiun Yao, Wei-Chen Chiu

dataset distillationself-supervised learningcross-architecture generalizationparameterization techniques+3

knowledge distillationhypergraph neural networksgraph convolutional networksteacher-student models+3

#70

DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks

Saman Forouzandeh, Parham Moradi Dowlatabadi, Mahdi Jalili

ICLR 2025

CVPR 2024arXiv:2403.01427

#71

KDAT: Inherent Adversarial Robustness via Knowledge Distillation with Adversarial Tuning for Object Detection Models

Yarin Yerushalmi Levi, Edita Grolman, Idan Yankelev et al.

SelKD: Selective Knowledge Distillation via Optimal Transport Perspective

Liangliang Shi, Zhengyan Shi, Junchi Yan

Logit Standardization in Knowledge Distillation

Shangquan Sun, Wenqi Ren, Jingzhi Li et al.

CVPR 2024arXiv:2404.07933

#74

CrossKD: Cross-Head Knowledge Distillation for Object Detection

JiaBao Wang, yuming chen, Zhaohui Zheng et al.

Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation

Keonhee Han, Dominik Muhle, Felix Wimbauer et al.

CVPR 2024arXiv:2403.05061

#76

Harnessing Language Model for Cross-Heterogeneity Graph Knowledge Transfer

Jinyu Yang, Ruijia Wang, Cheng Yang et al.

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

Geonho Bang, Kwangjin Choi, Jisong Kim et al.

CVPR 2024arXiv:2312.03052

#78

Co-Progression Knowledge Distillation with Knowledge Prototype for Industrial Anomaly Detection

Bokang Yang, Zhe Zhang, Jie Ma

Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance

Dong Chen, Yueting Zhuang, Shuo Zhang et al.

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

ICLR 2025arXiv:2404.05405

#81

Maintaining Fairness in Logit-based Knowledge Distillation for Class-Incremental Learning

Zijian Gao, Shanhao Han, Xingxing Zhang et al.

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Zeyuan Allen-Zhu, Yuanzhi Li

knowledge capacity scalinglanguage model scaling lawsfactual knowledge representationmodel quantization effects+4

CVPR 2024arXiv:2406.07876

#83

BLADE: Enhancing Black-Box Large Language Models with Small Domain-Specific Models

Haitao Li, Qingyao Ai, Jia Chen et al.

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

Huanxuan Liao, Shizhu He, Yao Xu et al.

Small Scale Data-Free Knowledge Distillation

He Liu, Yikai Wang, Huaping Liu et al.

AAAI 2024arXiv:2303.11098

#86

Progressively Knowledge Distillation via Re-parameterizing Diffusion Reverse Process

Xufeng Yao, Fanbin Lu, Yuechen Zhang et al.

Understanding the Role of the Projector in Knowledge Distillation

AAAI 2024arXiv:2303.11611

#88

Heuristic-free Knowledge Distillation for Streaming ASR via Multi-modal Training

Ji Won Yoon

Out of Thin Air: Exploring Data-Free Adversarial Robustness Distillation

Yuzheng Wang, Zhaoyu Chen, Dingkang Yang et al.

adversarial robustness distillationdata-free learningknowledge transferadversarial training+3

AAAI 2025arXiv:2502.15959

#90

Low-Rank Knowledge Decomposition for Medical Foundation Models

Yuhang Zhou, Haolin li, Siyuan Du et al.

Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale

Alaa Khaddaj, Logan Engstrom, Aleksander Madry

A Knowledge Distillation-Based Approach to Enhance Transparency of Classifier Models

Yuchen Jiang, Xinyuan Zhao, Yihang Wu et al.

AAAI 2025arXiv:2412.16502

#93

Complementary Knowledge Distillation for Robust and Privacy-Preserving Model Serving in Vertical Federated Learning

Dashan Gao, Sheng Wan, Lixin Fan et al.

Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models

Chenhui Hu, Pengfei Cao, Yubo Chen et al.

Spatial-Temporal Knowledge Distillation for Takeaway Recommendation

Shuyuan Zhao, Wei Chen, Boyan Shi et al.

CVPR 2024arXiv:2308.12462

#96

Self-Training Based Few-Shot Node Classification by Knowledge Distillation

Zongqian Wu, Yujie Mo, Peng Zhou et al.

D^4: Dataset Distillation via Disentangled Diffusion Model

Duo Su, Junjie Hou, Weizhi Gao et al.

Overcoming Generic Knowledge Loss with Selective Parameter Update

Wenxuan Zhang, Paul Janson, Rahaf Aljundi et al.

#99

Adaptive Dual Guidance Knowledge Distillation

Tong Li, Long Liu, Kang Liu et al.

Real-Time Neural Denoising with Render-Aware Knowledge Distillation

Mengxun Kong, Jie Guo, Chen Wang et al.

AAAI 2025