"pre-training efficiency" Papers
3 papers found
Emerging Property of Masked Token for Effective Pre-training
Hyesong Choi, Hunsang Lee, Seyoung Joung et al.
ECCV 2024posterarXiv:2404.08330
10
citations
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin et al.
ICML 2024poster
Getting the most out of your tokenizer for pre-training and domain adaptation
Gautier Dagan, Gabriel Synnaeve, Baptiste Roziere
ICML 2024poster