"subword tokenization" Papers
2 papers found
Conference
Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier
Craig W Schmidt, Varshini Reddy, Chris Tanner et al.
COLM 2025paperarXiv:2504.00178
15
citations
Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models
Pit Neitemeier, Björn Deiseroth, Constantin Eichenberg et al.
ICLR 2025posterarXiv:2501.10322
11
citations