2025 "transformer-based models" Papers
5 papers found
AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs
David McCoy, Yulun Wu, Zachary Butzin-Dozier
NeurIPS 2025posterarXiv:2511.01077
Enhancing the Maximum Effective Window for Long-Term Time Series Forecasting
Jiahui Zhang, Zhengyang Zhou, Wenjie Du et al.
NeurIPS 2025poster
From Promise to Practice: Realizing High-performance Decentralized Training
Zesen Wang, Jiaojiao Zhang, Xuyang Wu et al.
ICLR 2025posterarXiv:2410.11998
2
citations
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu, Yizhong Wang, Guangxuan Xiao et al.
ICLR 2025posterarXiv:2404.15574
140
citations
SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting
Hui Chen, Viet Luong, Lopamudra Mukherjee et al.
ICLR 2025oral