2024 "language modeling" Papers
22 papers found
AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training
Li Ding, Wen Fei, Yuyang Huang et al.
An Independence-promoting Loss for Music Generation with Language Models
Jean-Marie Lemercier, Simon Rouard, Jade Copet et al.
Cached Transformers: Improving Transformers with Differentiable Memory Cached
Zhaoyang Zhang, Wenqi Shao, Yixiao Ge et al.
Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks
Jong Ho Park, Jaden Park, Zheyang Xiong et al.
Differentiable Model Scaling using Differentiable Topk
Kai Liu, Ruohui Wang, Jianfei Gao et al.
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Aaron Lou, Chenlin Meng, Stefano Ermon
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann, Simon Schrodi, Jelena Bratulić et al.
Exploring Transformer Extrapolation
Zhen Qin, Yiran Zhong, Hui Deng
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang, Bailin Wang, Yikang Shen et al.
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao, Qingye Meng, Shengping Li et al.
In-Context Language Learning: Architectures and Algorithms
Ekin Akyürek, Bailin Wang, Yoon Kim et al.
Matrix Information Theory for Self-Supervised Learning
Yifan Zhang, Zhiquan Tan, Jingqin Yang et al.
Modeling Language Tokens as Functionals of Semantic Fields
Zhengqi Pei, Anran Zhang, Shuhui Wang et al.
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou, Mario Fritz, Margret Keuper
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
Praneeth Kacham, Vahab Mirrokni, Peilin Zhong
Positive Concave Deep Equilibrium Models
Mateusz Gabor, Tomasz Piotrowski, Renato L. G. Cavalcante
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo, Xinghao Chen, Yehui Tang et al.
SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
Xingrun Xing, Zheng Zhang, Ziyi Ni et al.
StableMask: Refining Causal Masking in Decoder-only Transformer
Qingyu Yin, Xuzheng He, Xiang Zhuang et al.
Stay on Topic with Classifier-Free Guidance
Guillaume Sanchez, Alexander Spangher, Honglu Fan et al.
Trainable Transformer in Transformer
Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia et al.
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao, Albert Gu