Knowledge Distillation
Transferring knowledge to smaller models
Top Papers
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai, Xinyang Geng, Karttikeya Mangalam et al.
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
Zhenyu Li, Sunqi Fan, Yu Gu et al.
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting
Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.
Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining
Xiang Chen, Jinshan Pan, Jiangxin Dong
Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution
Zhiyuan You, Xin Cai, Jinjin Gu et al.
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang, Zhulin An, Libo Huang et al.
Towards Foundation Models for Knowledge Graph Reasoning
Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.
Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
Micah Goldblum, Marc Finzi, Keefer Rowan et al.
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Yuheng Chen, Pengfei Cao, Yubo Chen et al.
Inductive Moment Matching
Linqi (Alex) Zhou, Stefano Ermon, Jiaming Song
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo, Yingqing He, Haoxin Chen et al.
What does the Knowledge Neuron Thesis Have to do with Knowledge?
Jingcheng Niu, Andrew Liu, Zining Zhu et al.
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
Jie Ren, Yaxin Li, Shenglai Zeng et al.
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon, Lorenzo Noci, Mufan Li et al.
Data Shapley in One Training Run
Jiachen (Tianhao) Wang, Prateek Mittal, Dawn Song et al.
Towards Continual Knowledge Graph Embedding via Incremental Distillation
Jiajun Liu, Ke Wenjun, Peng Wang et al.
XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar, Ali Etemad
Synthetic continued pretraining
Zitong Yang, Neil Band, Shuangping Li et al.
Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models
Shuang Li, Jiangjie Chen, Siyu Yuan et al.
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda et al.
Distilling Autoregressive Models to Obtain High-Performance Non-autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed
Yubin Xiao, Di Wang, Boyang Li et al.
Dataset Distillation by Automatic Training Trajectories
Dai Liu, Jindong Gu, Hu Cao et al.
Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
Shaobo Wang, Yicun Yang, Zhiyuan Liu et al.
Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification
Kunlun Xu, Xu Zou, Yuxin Peng et al.
Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenda Xu, Rujun Han, Zifeng Wang et al.
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
Jusheng Zhang, Zimeng Huang, Yijia Fan et al.
eTag: Class-Incremental Learning via Embedding Distillation and Task-Oriented Generation
Libo Huang, Yan Zeng, Chuanguang Yang et al.
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
Belinda Mo, Kyssen Yu, Joshua Kazdan et al.
DTL: Disentangled Transfer Learning for Visual Recognition
Minghao Fu, Ke Zhu, Jianxin Wu
Training-Free Pretrained Model Merging
Zhengqi Xu, Ke Yuan, Huiqiong Wang et al.
Specialized Foundation Models Struggle to Beat Supervised Baselines
Zongzhe Xu, Ritvik Gupta, Wenduo Cheng et al.
VkD: Improving Knowledge Distillation using Orthogonal Projections
Roy Miles, Ismail Elezi, Jiankang Deng
FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection
Dongmei Zhang, Chang Li, Renrui Zhang et al.
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish et al.
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui, Mo Zhu, Yulei Qin et al.
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Yang Zhou, Xu Gao, Zichong Chen et al.
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan
Unlocking Dataset Distillation with Diffusion Models
Brian Moser, Federico Raue, Sebastian Palacio et al.
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
Marco Mistretta, Alberto Baldrati, Marco Bertini et al.
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Jianqing Zhang, Yang Liu, Yang Hua et al.
Embarrassingly Simple Dataset Distillation
Yunzhen Feng, Shanmukha Ramakrishna Vedantam, Julia Kempe
To Grok or not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets
Darshil Doshi, Aritra Das, Tianyu He et al.
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
Mehdi Noroozi, Isma Hadji, Brais Martinez et al.
Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
Anke Tang, Enneng Yang, Li Shen et al.
UNIC: Universal Classification Models via Multi-teacher Distillation
Yannis Kalantidis, Larlus Diane, Mert Bulent SARIYILDIZ et al.
Towards Adversarially Robust Dataset Distillation by Curvature Regularization
Eric Xue, Yijiang Li, Haoyang Liu et al.
Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
Amin Parchami, Moritz Böhle, Sukrut Rao et al.
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang, Dingkang Yang, Zhaoyu Chen et al.
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
Ming Zhong, Chenxin An, Weizhu Chen et al.
Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning
Kai Jiang, Zhengyan Shi, Dell Zhang et al.
MiniPLM: Knowledge Distillation for Pre-training Language Models
Yuxian Gu, Hao Zhou, Fandong Meng et al.
Mirage: Model-agnostic Graph Distillation for Graph Classification
Mridul Gupta, Sahil Manchanda, HARIPRASAD KODAMANA et al.
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen et al.
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
Yu Wang, Xin Li, Shengzhao Wen et al.
History Matters: Temporal Knowledge Editing in Large Language Model
Xunjian Yin, Jin Jiang, Liming Yang et al.
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
Sungmin Cha, Sungjun Cho, Dasol Hwang et al.
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity
Jiachen Jiang, Jinxin Zhou, Zhihui Zhu
Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer, Ivan Vulić, Edoardo Maria Ponti
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.
A Good Learner can Teach Better: Teacher-Student Collaborative Knowledge Distillation
Ayan Sengupta, Shantanu Dixit, Md Shad Akhtar et al.
Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering
Yifan Lu, Yigeng Zhou, Jing Li et al.
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao, Jianpeng Zhang, Yingda Xia et al.
Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information Networks
Yankai Chen, Yixiang Fang, Qiongyan Wang et al.
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
Cheng Han, Qifan Wang, Sohail A Dianat et al.
Large Language Model Meets Graph Neural Network in Knowledge Distillation
Shengxiang Hu, Guobing Zou, Song Yang et al.
Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction
Zhixuan Chu, Mengxuan Hu, Qing Cui et al.
Distilling Reliable Knowledge for Instance-Dependent Partial Label Learning
Dong-Dong Wu, Deng-Bao Wang, Min-Ling Zhang
Fine-Grained Knowledge Selection and Restoration for Non-exemplar Class Incremental Learning
Authors: Jiang-Tian Zhai, Xialei Liu, Lu Yu et al.
Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching
Ruonan Yu, Songhua Liu, Jingwen Ye et al.
The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models
Naveen George, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu et al.
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao, Bingkun Huang, Sen Xing et al.
TabDPT: Scaling Tabular Foundation Models on Real Data
Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.
Cost-efficient Collaboration between On-device and Cloud Language Models
Avanika Narayan, Dan Biderman, Sabri Eyuboglu et al.
Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth
Zimin Xia, Yujiao Shi, HONGDONG LI et al.
Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
Chuanguang Yang, XinQiang Yu, Han Yang et al.
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
Tianci Liu, Ruirui Li, Yunzhe Qi et al.
Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao et al.
Minimum-Norm Interpolation Under Covariate Shift
Neil Mallinar, Austin Zane, Spencer Frei et al.
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
Haowen Pan, Xiaozhi Wang, Yixin Cao et al.
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
Jaewon Jung, Hongsun Jang, Jaeyong Song et al.
KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models
Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.
How to Train the Teacher Model for Effective Knowledge Distillation
Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan et al.
Knowledge-Aware Parameter Coaching for Personalized Federated Learning
Mingjian Zhi, Yuanguo Bi, Wenchao Xu et al.
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao, Bo Wan, XU JIA et al.
Knowledge Localization: Mission Not Accomplished? Enter Query Localization!
Yuheng Chen, Pengfei Cao, Yubo Chen et al.
Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models
Zheng Hu, Zhe Li, Ziyun Jiao et al.
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Minki Kang, Jongwon Jeong, Seanie Lee et al.
LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection
Sanmin Kim, Youngseok Kim, Sihwan Hwang et al.
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks
Kairong Yu, Chengting Yu, Tianqing Zhang et al.
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Bokai Lin, Zihao Zeng, Zipeng Xiao et al.
Let All Be Whitened: Multi-Teacher Distillation for Efficient Visual Retrieval
Zhe Ma, Jianfeng Dong, Shouling Ji et al.
GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
Xinyi Shang, Peng Sun, Tao Lin
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao, Xiaohan Ding, Juexiao Feng et al.
Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
Shikai Qiu, Lechao Xiao, Andrew Wilson et al.
Active Object Detection with Knowledge Aggregation and Distillation from Large Models
Dejie Yang, Yang Liu
Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation
Wei Cong, Yang Cong, Yuyang Liu et al.
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners
Bowen Shi, XIAOPENG ZHANG, Yaoming Wang et al.