ICLR Poster "transformer-based models" Papers
2 papers found
From Promise to Practice: Realizing High-performance Decentralized Training
Zesen Wang, Jiaojiao Zhang, Xuyang Wu et al.
ICLR 2025posterarXiv:2410.11998
2
citations
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.
ICLR 2025posterarXiv:2404.15574
140
citations