Poster "language modeling" Papers

34 papers found

Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking

Heli Ben-Hamu, Itai Gat, Daniel Severo et al.

NeurIPS 2025posterarXiv:2505.24857
40
citations

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Damien GOMES, Yanlei Zhang, Eugene Belilovsky et al.

ICLR 2025posterarXiv:2405.16397
5
citations

Chunk-Distilled Language Modeling

Yanhong Li, Karen Livescu, Jiawei Zhou

ICLR 2025posterarXiv:2501.00343
3
citations

Continuous Diffusion Model for Language Modeling

Jaehyeong Jo, Sung Ju Hwang

NeurIPS 2025posterarXiv:2502.11564
4
citations

Differential Transformer

Tianzhu Ye, Li Dong, Yuqing Xia et al.

ICLR 2025posterarXiv:2410.05258

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets

Mathurin VIDEAU, Badr Youbi Idrissi, Alessandro Leite et al.

NeurIPS 2025posterarXiv:2506.14761
5
citations

Glauber Generative Model: Discrete Diffusion Models via Binary Classification

Harshit Varma, Dheeraj Nagaraj, Karthikeyan Shanmugam

ICLR 2025posterarXiv:2405.17035
7
citations

Language Models Are Implicitly Continuous

Samuele Marro, Davide Evangelista, X. Huang et al.

ICLR 2025posterarXiv:2504.03933
3
citations

MIND over Body: Adaptive Thinking using Dynamic Computation

Mrinal Mathur, Barak Pearlmutter, Sergey Plis

ICLR 2025poster
2
citations

Nested Learning: The Illusion of Deep Learning Architectures

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong et al.

NeurIPS 2025posterarXiv:2512.24695
12
citations

Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

Cai Zhou, Chenyu Wang, Dinghuai Zhang et al.

NeurIPS 2025posterarXiv:2510.08632
3
citations

Selective Attention Improves Transformer

Yaniv Leviathan, Matan Kalman, Yossi Matias

ICLR 2025posterarXiv:2410.02703
20
citations

ShortListing Model: A Streamlined Simplex Diffusion for Discrete Variable Generation

Yuxuan Song, Zhe Zhang, Yu Pei et al.

NeurIPS 2025poster
1
citations

The AdEMAMix Optimizer: Better, Faster, Older

Matteo Pagliardini, Pierre Ablin, David Grangier

ICLR 2025posterarXiv:2409.03137
23
citations

Tight Clusters Make Specialized Experts

Stefan Nielsen, Rachel Teo, Laziz Abdullaev et al.

ICLR 2025posterarXiv:2502.15315
6
citations

AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training

Li Ding, Wen Fei, Yuyang Huang et al.

ICML 2024poster

An Independence-promoting Loss for Music Generation with Language Models

Jean-Marie Lemercier, Simon Rouard, Jade Copet et al.

ICML 2024posterarXiv:2406.02315

Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks

Jong Ho Park, Jaden Park, Zheyang Xiong et al.

ICML 2024posterarXiv:2402.04248

Differentiable Model Scaling using Differentiable Topk

Kai Liu, Ruohui Wang, Jianfei Gao et al.

ICML 2024posterarXiv:2405.07194

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, Stefano Ermon

ICML 2024posterarXiv:2310.16834

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

David T. Hoffmann, Simon Schrodi, Jelena Bratulić et al.

ICML 2024posterarXiv:2310.12956

Gated Linear Attention Transformers with Hardware-Efficient Training

Songlin Yang, Bailin Wang, Yikang Shen et al.

ICML 2024posterarXiv:2312.06635

Improving Transformers with Dynamically Composable Multi-Head Attention

Da Xiao, Qingye Meng, Shengping Li et al.

ICML 2024posterarXiv:2405.08553

In-Context Language Learning: Architectures and Algorithms

Ekin Akyürek, Bailin Wang, Yoon Kim et al.

ICML 2024posterarXiv:2401.12973

Matrix Information Theory for Self-Supervised Learning

Yifan Zhang, Zhiquan Tan, Jingqin Yang et al.

ICML 2024posterarXiv:2305.17326

Modeling Language Tokens as Functionals of Semantic Fields

Zhengqi Pei, Anran Zhang, Shuhui Wang et al.

ICML 2024poster

MultiMax: Sparse and Multi-Modal Attention Learning

Yuxuan Zhou, Mario Fritz, Margret Keuper

ICML 2024posterarXiv:2406.01189

PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels

Praneeth Kacham, Vahab Mirrokni, Peilin Zhong

ICML 2024posterarXiv:2310.01655

Positive Concave Deep Equilibrium Models

Mateusz Gabor, Tomasz Piotrowski, Renato L. G. Cavalcante

ICML 2024posterarXiv:2402.04029

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Jialong Guo, Xinghao Chen, Yehui Tang et al.

ICML 2024posterarXiv:2405.11582

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

Xingrun Xing, Zheng Zhang, Ziyi Ni et al.

ICML 2024posterarXiv:2406.03287

StableMask: Refining Causal Masking in Decoder-only Transformer

Qingyu Yin, Xuzheng He, Xiang Zhuang et al.

ICML 2024posterarXiv:2402.04779

Trainable Transformer in Transformer

Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia et al.

ICML 2024posterarXiv:2307.01189

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao, Albert Gu

ICML 2024posterarXiv:2405.21060