Paper "tokenization" Papers
3 papers found
Conference
Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier
Craig W Schmidt, Varshini Reddy, Chris Tanner et al.
COLM 2025paperarXiv:2504.00178
15
citations
SuperBPE: Space Travel for Language Models
Alisa Liu, Jonathan Hayase, Valentin Hofmann et al.
COLM 2025paperarXiv:2503.13423
34
citations
UTF-8 Plumbing: Byte-level Tokenizers Unavoidably Enable LLMs to Generate Ill-formed UTF-8
Preston Firestone, Shubham Ugare, Gagandeep Singh et al.
COLM 2025paperarXiv:2511.05578
1
citations