by Guangyu Song Papers
2 papers found
MMTEB: Massive Multilingual Text Embedding Benchmark
Kenneth Enevoldsen, Isaac Chung, Imene Kerboua et al.
ICLR 2025posterarXiv:2502.13595
74
citations
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Nikhil Kandpal, Brian Lester, Colin Raffel et al.
NeurIPS 2025posterarXiv:2506.05209
10
citations