🧬Efficiency

Knowledge Distillation

Transferring knowledge to smaller models

100 papers2,744 total citations

Compare with other topics

Feb '24 — Jan '26659 papers

Top Conferences

ICLR: 25 AAAI: 24 CVPR: 19 ECCV: 17 ICML: 7 NeurIPS: 7

Top Papers

#1

Sequential Modeling Enables Scalable Learning for Large Vision Models

Yutong Bai, Xinyang Geng, Karttikeya Mangalam et al.

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

Zhenyu Li, Sunqi Fan, Yu Gu et al.

AAAI 2024arXiv:2308.12060

knowledge base question answeringfew-shot learningsparql query generationprogram translation+4

122

citations

#3

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.

AAAI 2024arXiv:2311.13314

knowledge graph integrationlarge language model hallucinationfactual knowledge retrievalautonomous knowledge verification+2

108

citations

#4

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Xiang Chen, Jinshan Pan, Jiangxin Dong

Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution

Zhiyuan You, Xin Cai, Jinjin Gu et al.

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.

ICLR 2025arXiv:2411.14257

sparse autoencodershallucination mechanismsentity recognitionknowledge awareness+3

77

citations

#9

Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Micah Goldblum, Marc Finzi, Keefer Rowan et al.

ICML 2024

no free lunch theoremskolmogorov complexityinductive biasessupervised learning+4

60

citations

#10

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

AAAI 2024arXiv:2308.13198

knowledge neuronsmultilingual language modelsfactual knowledge storageintegrated gradients method+4

59

citations

#11

Inductive Moment Matching

Linqi (Alex) Zhou, Stefano Ermon, Jiaming Song

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Lanqing Guo, Yingqing He, Haoxin Chen et al.

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Jingcheng Niu, Andrew Liu, Zining Zhu et al.

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

Jie Ren, Yaxin Li, Shenglai Zeng et al.

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Blake Bordelon, Lorenzo Noci, Mufan Li et al.

Data Shapley in One Training Run

Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.

ICLR 2025arXiv:2406.11011

data attributiondata shapleyfoundation model pretraininggenerative ai copyright+3

44

citations

#17

Towards Continual Knowledge Graph Embedding via Incremental Distillation

Jiajun Liu, Ke Wenjun, Peng Wang et al.

AAAI 2024arXiv:2405.04453

knowledge graph embeddingcontinual learningincremental distillationcatastrophic forgetting+4

39

citations

#18

XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Pritam Sarkar, Ali Etemad

AAAI 2024arXiv:2211.13929

cross-modal knowledge distillationmasked data reconstructiondomain alignment strategyvideo representation learning+4

38

citations

#19

Synthetic continued pretraining

Zitong Yang, Neil Band, Shuangping Li et al.

Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models

Shuang Li, Jiangjie Chen, Siyu Yuan et al.

AAAI 2024arXiv:2308.13961

idiomatic translationknowledge base constructiontransformer-based systemscontext-aware retrieval+3

35

citations

#21

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.

Distilling Autoregressive Models to Obtain High-Performance Non-autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

Yubin Xiao, Di Wang, Boyang Li et al.

AAAI 2024arXiv:2312.12469

knowledge distillationautoregressive modelsnon-autoregressive modelsvehicle routing problems+2

31

citations

#24

Dataset Distillation by Automatic Training Trajectories

Dai Liu, Jindong Gu, Hu Cao et al.

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.

Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification

Kunlun Xu, Xu Zou, Yuxin Peng et al.

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wenda Xu, Rujun Han, Zifeng Wang et al.

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

Jusheng Zhang, Zimeng Huang, Yijia Fan et al.

eTag: Class-Incremental Learning via Embedding Distillation and Task-Oriented Generation

Libo Huang, Yan Zeng, Chuanguang Yang et al.

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

Belinda Mo, Kyssen Yu, Joshua Kazdan et al.

NeurIPS 2025arXiv:2502.09956

knowledge graph extractionfoundation modelstext-to-kg generationentity clustering+2

25

citations

#32

DTL: Disentangled Transfer Learning for Visual Recognition

Minghao Fu, Ke Zhu, Jianxin Wu

AAAI 2024arXiv:2312.07856

parameter-efficient transfer learningvisual recognitiongpu memory reductiondisentangled representation learning+4

25

citations

#33

Training-Free Pretrained Model Merging

Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.

Specialized Foundation Models Struggle to Beat Supervised Baselines

Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.

VkD: Improving Knowledge Distillation using Orthogonal Projections

Roy Miles, Ismail Elezi, Jiankang Deng

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection

Dongmei Zhang, Chang Li, Renrui Zhang et al.

AAAI 2024arXiv:2312.14465

open-vocabulary 3d detectioncross-modal knowledge blendingfoundation modelsgrounded-segment-anything+4

22

citations

#37

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.

ICLR 2025arXiv:2406.16257

machine unlearningexact unlearningparameter-efficient fine-tuningparameter isolation+4

22

citations

#38

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

Xiao Cui, Mo Zhu, Yulei Qin et al.

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Yang Zhou, Xu Gao, Zichong Chen et al.

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan

Unlocking Dataset Distillation with Diffusion Models

Brian Moser, Federico Raue, Sebastian Palacio et al.

Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

Marco Mistretta, Alberto Baldrati, Marco Bertini et al.

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Jianqing Zhang, Yang Liu, Yang Hua et al.

Embarrassingly Simple Dataset Distillation

Yunzhen Feng, Shanmukha Ramakrishna Vedantam, Julia Kempe

To Grok or not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets

Darshil Doshi, Aritra Das, Tianyu He et al.

You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

Mehdi Noroozi, Isma Hadji, Brais Martinez et al.

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

Anke Tang, Enneng Yang, Li Shen et al.

UNIC: Universal Classification Models via Multi-teacher Distillation

Yannis Kalantidis, Larlus Diane, Mert Bulent SARIYILDIZ et al.

ECCV 2024arXiv:2408.05088

multi-teacher distillationuniversal classification modelsknowledge distillationencoder learning+3

18

citations

#49

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Eric Xue, Yijiang Li, Haoyang Liu et al.

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

Amin Parchami, Moritz Böhle, Sukrut Rao et al.

ECCV 2024arXiv:2402.03119

knowledge distillationexplainable aimodel compressionfeature alignment+2

18

citations

#51

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

Yuzheng Wang, Dingkang Yang, Zhaoyu Chen et al.

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Ming Zhong, Chenxin An, Weizhu Chen et al.

Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning

Kai Jiang, Zhengyan Shi, Dell Zhang et al.

MiniPLM: Knowledge Distillation for Pre-training Language Models

Yuxian Gu, Hao Zhou, Fandong Meng et al.

Mirage: Model-agnostic Graph Distillation for Graph Classification

Mridul Gupta, Sahil Manchanda, HARIPRASAD KODAMANA et al.

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen et al.

ECCV 2024arXiv:2403.09296

vision-language modelscontinual learningknowledge distillationcatastrophic forgetting+3

16

citations

#57

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Yu Wang, Xin Li, Shengzhao Wen et al.

History Matters: Temporal Knowledge Editing in Large Language Model

Xunjian Yin, Jin Jiang, Liming Yang et al.

AAAI 2024arXiv:2312.05497

temporal knowledge editinglarge language modelsknowledge updatingcatastrophic forgetting+4

15

citations

#59

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Sungmin Cha, Sungjun Cho, Dasol Hwang et al.

Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity

Jiachen Jiang, Jinxin Zhou, Zhihui Zhu

Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching

Benjamin Minixhofer, Ivan Vulić, Edoardo Maria Ponti

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.

A Good Learner can Teach Better: Teacher-Student Collaborative Knowledge Distillation

Ayan Sengupta, Shantanu Dixit, Md Shad Akhtar et al.

Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering

Yifan Lu, Yigeng Zhou, Jing Li et al.

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.

Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information Networks

Yankai Chen, Yixiang Fang, Qiongyan Wang et al.

AAAI 2024arXiv:2402.12411

node importance estimationheterogeneous information networksgraph neural modelsstructural knowledge exploitation+3

14

citations

#67

AMD: Automatic Multi-step Distillation of Large-scale Vision Models

Cheng Han, Qifan Wang, Sohail A Dianat et al.

ECCV 2024arXiv:2407.04208

model distillationvision model compressiontransformer-based architecturesknowledge distillation+4

14

citations

#68

Large Language Model Meets Graph Neural Network in Knowledge Distillation

Shengxiang Hu, Guobing Zou, Song Yang et al.

Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction

Zhixuan Chu, Mengxuan Hu, Qing Cui et al.

AAAI 2024arXiv:2312.16113

causal feature attributionrisk predictionclass imbalancecausal reasoning+3

13

citations

#70

Distilling Reliable Knowledge for Instance-Dependent Partial Label Learning

Dong-Dong Wu, Deng-Bao Wang, Min-Ling Zhang

Fine-Grained Knowledge Selection and Restoration for Non-exemplar Class Incremental Learning

Authors: Jiang-Tian Zhai, Xialei Liu, Lu Yu et al.

AAAI 2024arXiv:2312.12722

class incremental learningknowledge distillationcatastrophic forgettingfine-grained selection+4

13

citations

#72

Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching

Ruonan Yu, Songhua Liu, Jingwen Ye et al.

The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models

Naveen George, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu et al.

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Zhiyu Zhao, Bingkun Huang, Sen Xing et al.

TabDPT: Scaling Tabular Foundation Models on Real Data

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.

Cost-efficient Collaboration between On-device and Cloud Language Models

Avanika Narayan, Dan Biderman, Sabri Eyuboglu et al.

Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth

Zimin Xia, Yujiao Shi, HONGDONG LI et al.

ECCV 2024arXiv:2406.00474

cross-view localizationweakly supervised learningknowledge self-distillationpseudo ground truth+3

12

citations

#78

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition

Chuanguang Yang, XinQiang Yu, Han Yang et al.

Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning

Tianci Liu, Ruirui Li, Yunzhe Qi et al.

ICLR 2025arXiv:2503.00306

knowledge editinglarge language modelsrepresentation fine-tuningediting-locality trade-off+3

12

citations

#80

Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation

Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao et al.

Minimum-Norm Interpolation Under Covariate Shift

Neil Mallinar, Austin Zane, Spencer Frei et al.

ICML 2024

transfer learningcovariate shiftbenign overfittinglinear interpolation+3

12

citations

#82

Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs

Haowen Pan, Xiaozhi Wang, Yixin Cao et al.

PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor

Jaewon Jung, Hongsun Jang, Jaeyong Song et al.

KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.

How to Train the Teacher Model for Effective Knowledge Distillation

Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan et al.

Knowledge-Aware Parameter Coaching for Personalized Federated Learning

Mingjian Zhi, Yuanguo Bi, Wenchao Xu et al.

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

Haiwen Diao, Bo Wan, XU JIA et al.

ECCV 2024arXiv:2407.07523

parameter-efficient transfer learningmemory-efficient transfer learningvision-and-language taskslanguage-only tasks+3

11

citations

#88

Knowledge Localization: Mission Not Accomplished? Enter Query Localization!

Yuheng Chen, Pengfei Cao, Yubo Chen et al.

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Zheng Hu, Zhe Li, Ziyun Jiao et al.

Distilling LLM Agent into Small Models with Retrieval and Code Tools

Minki Kang, Jongwon Jeong, Seanie Lee et al.

LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection

Sanmin Kim, Youngseok Kim, Sihwan Hwang et al.

Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks

Kairong Yu, Chengting Yu, Tianqing Zhang et al.

CVPR 2025arXiv:2503.03144

knowledge distillationspiking neural networkstemporal separationentropy regularization+3

10

citations

#93

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

Bokai Lin, Zihao Zeng, Zipeng Xiao et al.

Let All Be Whitened: Multi-Teacher Distillation for Efficient Visual Retrieval

Zhe Ma, Jianfeng Dong, Shouling Ji et al.

AAAI 2024arXiv:2312.09716

visual retrievalmulti-teacher distillationknowledge distillationmodel efficiency+3

10

citations

#95

GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost

Xinyi Shang, Peng Sun, Tao Lin

Quantized Prompt for Efficient Generalization of Vision-Language Models

Tianxiang Hao, Xiaohan Ding, Juexiao Feng et al.

ECCV 2024arXiv:2407.10704

vision-language modelsprompt tuningquantization errorcatastrophic forgetting+4

9

citations

#97

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Shikai Qiu, Lechao Xiao, Andrew Wilson et al.

Active Object Detection with Knowledge Aggregation and Distillation from Large Models

Dejie Yang, Yang Liu

Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation

Wei Cong, Yang Cong, Yuyang Liu et al.

ECCV 2024arXiv:2407.09047

incremental semantic segmentationcatastrophic forgettingprototype-guided learningpseudo labeling+2

9

citations

#100

Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners

Bowen Shi, XIAOPENG ZHANG, Yaoming Wang et al.

ICLR 2024

9

citations

Knowledge Distillation

Top Conferences

Related Topics (Efficiency)

Top Papers

Sequential Modeling Enables Scalable Learning for Large Vision Models

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting

Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining

Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution

CLIP-KD: An Empirical Study of CLIP Model Distillation

Towards Foundation Models for Knowledge Graph Reasoning

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons

Inductive Moment Matching

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Data Shapley in One Training Run

Towards Continual Knowledge Graph Embedding via Incremental Distillation

XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning

Synthetic continued pretraining

Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Distilling Autoregressive Models to Obtain High-Performance Non-autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed

Dataset Distillation by Automatic Training Trajectories

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification

Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

eTag: Class-Incremental Learning via Embedding Distillation and Task-Oriented Generation

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

DTL: Disentangled Transfer Learning for Visual Recognition

Training-Free Pretrained Model Merging

Specialized Foundation Models Struggle to Beat Supervised Baselines

VkD: Improving Knowledge Distillation using Orthogonal Projections

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

Unlocking Dataset Distillation with Diffusion Models

Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Embarrassingly Simple Dataset Distillation

To Grok or not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets

You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging

UNIC: Universal Classification Models via Multi-teacher Distillation

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Good Teachers Explain: Explanation-Enhanced Knowledge Distillation

De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning

MiniPLM: Knowledge Distillation for Pre-training Language Models

Mirage: Model-agnostic Graph Distillation for Graph Classification

Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

History Matters: Temporal Knowledge Editing in Large Language Model

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity

Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching

Active Data Curation Effectively Distills Large-Scale Multimodal Models

A Good Learner can Teach Better: Teacher-Student Collaborative Knowledge Distillation

Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information Networks

AMD: Automatic Multi-step Distillation of Large-scale Vision Models

Large Language Model Meets Graph Neural Network in Knowledge Distillation

Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction

Distilling Reliable Knowledge for Instance-Dependent Partial Label Learning

Fine-Grained Knowledge Selection and Restoration for Non-exemplar Class Incremental Learning

Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching

The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

TabDPT: Scaling Tabular Foundation Models on Real Data

Cost-efficient Collaboration between On-device and Cloud Language Models