ICLR 2025 "knowledge distillation" Papers
16 papers found
ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation
Ze Yang, Shichao Dong, Ruibo Li et al.
ICLR 2025poster
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Shiping Gao, Fanqi Wan, Jiajian Guo et al.
ICLR 2025posterarXiv:2502.17927
4
citations
DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks
Saman Forouzandeh, Parham Moradi Dowlatabadi, Mahdi Jalili
ICLR 2025poster
1
citations
From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question-Answering
Nathaniel Weir, Bhavana Dalvi Mishra, Orion Weller et al.
ICLR 2025posterarXiv:2412.17701
3
citations
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee, Haebin Seong, Dong Bok Lee et al.
ICLR 2025posterarXiv:2410.01524
13
citations
Improving Language Model Distillation through Hidden State Matching
Sayantan Dasgupta, Trevor Cohn
ICLR 2025poster
7
citations
It Helps to Take a Second Opinion: Teaching Smaller LLMs To Deliberate Mutually via Selective Rationale Optimisation
Sohan Patnaik, Milan Aggarwal, Sumit Bhatia et al.
ICLR 2025posterarXiv:2503.02463
Learning Diagrams: A Graphical Language for Compositional Training Regimes
Mason Lary, Richard Samuelson, Alexander Wilentz et al.
ICLR 2025poster
LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing
Ruisi Cai, Saurav Muralidharan, Hongxu Yin et al.
ICLR 2025poster
4
citations
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
ICLR 2025posterarXiv:2408.15881
34
citations
Self-Updatable Large Language Models by Integrating Context into Model Parameters
Yu Wang, Xinshuang Liu, Xiusi Chen et al.
ICLR 2025posterarXiv:2410.00487
5
citations
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong (Ryan) Wang, Zifeng Wang, Long Le et al.
ICLR 2025posterarXiv:2407.08223
75
citations
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu, Shengcao Cao, Yu-Xiong Wang
ICLR 2025posterarXiv:2410.14633
6
citations
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing, Kou Misaki, Han Bao et al.
ICLR 2025oralarXiv:2501.16937
12
citations
Test-Time Ensemble via Linear Mode Connectivity: A Path to Better Adaptation
Byungjai Kim, Chanho Ahn, Wissam Baddar et al.
ICLR 2025poster
3
citations
UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation
Xianwei Zhuang, Zhihong Zhu, Zhichang Wang et al.
ICLR 2025poster
7
citations