ICML 2024 "language modeling" Papers

20 papers found

AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training

Li Ding, Wen Fei, Yuyang Huang et al.

ICML 2024poster

An Independence-promoting Loss for Music Generation with Language Models

Jean-Marie Lemercier, Simon Rouard, Jade Copet et al.

ICML 2024posterarXiv:2406.02315

Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks

Jong Ho Park, Jaden Park, Zheyang Xiong et al.

ICML 2024posterarXiv:2402.04248

Differentiable Model Scaling using Differentiable Topk

Kai Liu, Ruohui Wang, Jianfei Gao et al.

ICML 2024posterarXiv:2405.07194

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, Stefano Ermon

ICML 2024posterarXiv:2310.16834

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

David T. Hoffmann, Simon Schrodi, Jelena Bratulić et al.

ICML 2024posterarXiv:2310.12956

Gated Linear Attention Transformers with Hardware-Efficient Training

Songlin Yang, Bailin Wang, Yikang Shen et al.

ICML 2024posterarXiv:2312.06635

Improving Transformers with Dynamically Composable Multi-Head Attention

Da Xiao, Qingye Meng, Shengping Li et al.

ICML 2024posterarXiv:2405.08553

In-Context Language Learning: Architectures and Algorithms

Ekin Akyürek, Bailin Wang, Yoon Kim et al.

ICML 2024posterarXiv:2401.12973

Matrix Information Theory for Self-Supervised Learning

Yifan Zhang, Zhiquan Tan, Jingqin Yang et al.

ICML 2024posterarXiv:2305.17326

Modeling Language Tokens as Functionals of Semantic Fields

Zhengqi Pei, Anran Zhang, Shuhui Wang et al.

ICML 2024poster

MultiMax: Sparse and Multi-Modal Attention Learning

Yuxuan Zhou, Mario Fritz, Margret Keuper

ICML 2024posterarXiv:2406.01189

PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels

Praneeth Kacham, Vahab Mirrokni, Peilin Zhong

ICML 2024posterarXiv:2310.01655

Positive Concave Deep Equilibrium Models

Mateusz Gabor, Tomasz Piotrowski, Renato L. G. Cavalcante

ICML 2024posterarXiv:2402.04029

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Jialong Guo, Xinghao Chen, Yehui Tang et al.

ICML 2024posterarXiv:2405.11582

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

Xingrun Xing, Zheng Zhang, Ziyi Ni et al.

ICML 2024posterarXiv:2406.03287

StableMask: Refining Causal Masking in Decoder-only Transformer

Qingyu Yin, Xuzheng He, Xiang Zhuang et al.

ICML 2024posterarXiv:2402.04779

Stay on Topic with Classifier-Free Guidance

Guillaume Sanchez, Alexander Spangher, Honglu Fan et al.

ICML 2024spotlightarXiv:2306.17806

Trainable Transformer in Transformer

Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia et al.

ICML 2024posterarXiv:2307.01189

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao, Albert Gu

ICML 2024posterarXiv:2405.21060