On LLM Knowledge Distillation - A Comparison between Forward KL and Reverse KL

0citations

citations

#2434

in ICLR 2025

of 3827 papers

Top Authors

Data Points

Top Authors

Yihan Cao Yanbin Kang

Topics

knowledge distillation large language models kl divergence forward kl divergence reverse kl divergence model compression

Abstract

In this blog post, we delve into knowledge distillation techniques for Large Language Models (LLMs), with a particular focus on using Kullback-Leibler (KL) Divergence as the optimization objective. Knowledge distillation is a powerful tool to reduce model size while maintaining comparable performance, making it especially useful in scenarios with constrained computational or serving resources. We specifically explore the nuances of Forward KL divergence and Reverse KL divergence, examining their roles in the distillation process. By comparing these two approaches, we aim to uncover their behaviours, strengths, and practical applications in LLM distillation.

Citation History

Jan 26, 2026

Jan 27, 2026

Feb 2, 2026