"language modeling" Papers

31 papers found

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NeurIPS 2025posterarXiv:2505.24857
40
citations

Chunk-Distilled Language Modeling

Yanhong Li, Karen Livescu, Jiawei Zhou

ICLR 2025posterarXiv:2501.00343
3
citations

Continuous Diffusion Model for Language Modeling

Jaehyeong Jo, Sung Ju Hwang

NeurIPS 2025posterarXiv:2502.11564
4
citations

Differential Transformer

Tianzhu Ye, Li Dong, Yuqing Xia et al.

ICLR 2025posterarXiv:2410.05258

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

Mathurin VIDEAU, Badr Youbi Idrissi, Alessandro Leite et al.

NeurIPS 2025posterarXiv:2506.14761
5
citations

Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

Cai Zhou, Chenyu Wang, Dinghuai Zhang et al.

NeurIPS 2025poster
3
citations

Scaling up Masked Diffusion Models on Text

Shen Nie, Fengqi Zhu, Chao Du et al.

ICLR 2025oralarXiv:2410.18514
110
citations

ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation

Yuxuan Song, Zhe Zhang, Yu Pei et al.

NeurIPS 2025poster
1
citations

Tight Clusters Make Specialized Experts

Stefan Nielsen, Rachel Teo, Laziz Abdullaev et al.

ICLR 2025posterarXiv:2502.15315
6
citations

AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training

Li Ding, Wen Fei, Yuyang Huang et al.

ICML 2024poster

An Independence-promoting Loss for Music Generation with Language Models

Jean-Marie Lemercier, Simon Rouard, Jade Copet et al.

ICML 2024poster

Cached Transformers: Improving Transformers with Differentiable Memory Cached

Zhaoyang Zhang, Wenqi Shao, Yixiao Ge et al.

AAAI 2024paperarXiv:2312.12742
5
citations

Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks

Jong Ho Park, Jaden Park, Zheyang Xiong et al.

ICML 2024poster

Differentiable Model Scaling using Differentiable Topk

Kai Liu, Ruohui Wang, Jianfei Gao et al.

ICML 2024poster

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, Stefano Ermon

ICML 2024poster

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

David T. Hoffmann, Simon Schrodi, Jelena Bratulić et al.

ICML 2024poster

Exploring Transformer Extrapolation

Zhen Qin, Yiran Zhong, Hui Deng

AAAI 2024paperarXiv:2307.10156
12
citations

Gated Linear Attention Transformers with Hardware-Efficient Training

Songlin Yang, Bailin Wang, Yikang Shen et al.

ICML 2024poster

Improving Transformers with Dynamically Composable Multi-Head Attention

Da Xiao, Qingye Meng, Shengping Li et al.

ICML 2024poster

In-Context Language Learning: Architectures and Algorithms

Ekin Akyürek, Bailin Wang, Yoon Kim et al.

ICML 2024poster

Matrix Information Theory for Self-Supervised Learning

Yifan Zhang, Zhiquan Tan, Jingqin Yang et al.

ICML 2024poster

Modeling Language Tokens as Functionals of Semantic Fields

Zhengqi Pei, Anran Zhang, Shuhui Wang et al.

ICML 2024poster

MultiMax: Sparse and Multi-Modal Attention Learning

Yuxuan Zhou, Mario Fritz, Margret Keuper

ICML 2024poster

PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels

Praneeth Kacham, Vahab Mirrokni, Peilin Zhong

ICML 2024poster

Positive Concave Deep Equilibrium Models

Mateusz Gabor, Tomasz Piotrowski, Renato L. G. Cavalcante

ICML 2024poster

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Jialong Guo, Xinghao Chen, Yehui Tang et al.

ICML 2024poster

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

Xingrun Xing, Zheng Zhang, Ziyi Ni et al.

ICML 2024poster

StableMask: Refining Causal Masking in Decoder-only Transformer

Qingyu Yin, Xuzheng He, Xiang Zhuang et al.

ICML 2024poster

Stay on Topic with Classifier-Free Guidance

Guillaume Sanchez, Alexander Spangher, Honglu Fan et al.

ICML 2024spotlight

Trainable Transformer in Transformer

Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia et al.

ICML 2024poster

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao, Albert Gu

ICML 2024poster