ICLR 2025 "knowledge distillation" Papers

16 papers found

ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation

Ze Yang, Shichao Dong, Ruibo Li et al.

ICLR 2025poster

Advantage-Guided Distillation for Preference Alignment in Small Language Models

Shiping Gao, Fanqi Wan, Jiajian Guo et al.

ICLR 2025posterarXiv:2502.17927
4
citations

DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks

Saman Forouzandeh, Parham Moradi Dowlatabadi, Mahdi Jalili

ICLR 2025poster
1
citations

From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question-Answering

Nathaniel Weir, Bhavana Dalvi Mishra, Orion Weller et al.

ICLR 2025posterarXiv:2412.17701
3
citations

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Seanie Lee, Haebin Seong, Dong Bok Lee et al.

ICLR 2025posterarXiv:2410.01524
13
citations

Improving Language Model Distillation through Hidden State Matching

Sayantan Dasgupta, Trevor Cohn

ICLR 2025poster
7
citations

It Helps to Take a Second Opinion: Teaching Smaller LLMs To Deliberate Mutually via Selective Rationale Optimisation

Sohan Patnaik, Milan Aggarwal, Sumit Bhatia et al.

ICLR 2025posterarXiv:2503.02463

Learning Diagrams: A Graphical Language for Compositional Training Regimes

Mason Lary, Richard Samuelson, Alexander Wilentz et al.

ICLR 2025poster

LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing

Ruisi Cai, Saurav Muralidharan, Hongxu Yin et al.

ICLR 2025poster
4
citations

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Fangxun Shu, Yue Liao, Lei Zhang et al.

ICLR 2025posterarXiv:2408.15881
34
citations

Self-Updatable Large Language Models by Integrating Context into Model Parameters

Yu Wang, Xinshuang Liu, Xiusi Chen et al.

ICLR 2025posterarXiv:2410.00487
5
citations

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Zilong (Ryan) Wang, Zifeng Wang, Long Le et al.

ICLR 2025posterarXiv:2407.08223
75
citations

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Yuxiang Lu, Shengcao Cao, Yu-Xiong Wang

ICLR 2025posterarXiv:2410.14633
6
citations

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Makoto Shing, Kou Misaki, Han Bao et al.

ICLR 2025oralarXiv:2501.16937
12
citations

Test-Time Ensemble via Linear Mode Connectivity: A Path to Better Adaptation

Byungjai Kim, Chanho Ahn, Wissam Baddar et al.

ICLR 2025poster
3
citations

UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation

Xianwei Zhuang, Zhihong Zhu, Zhichang Wang et al.

ICLR 2025poster
7
citations