Poster by Acyr Locatelli Papers
4 papers found
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Laura Ruis, Maximilian Mozes, Juhan Bae et al.
ICLR 2025poster
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.
NeurIPS 2025poster
20
citations
To Code or Not To Code? Exploring Impact of Code in Pre-training
Viraat Aryabumi, Yixuan Su, Raymond Ma et al.
ICLR 2025poster
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Ted Zadouri, Ahmet Üstün, Arash Ahmadian et al.
ICLR 2024poster
138
citations