2025 "dataset curation" Papers
4 papers found
Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance
Aladin Djuhera, Swanand Kadhe, Syed Zawad et al.
NeurIPS 2025spotlightarXiv:2506.06522
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
Bohan Zhou, Yi Zhan, Zhongbin Zhang et al.
NeurIPS 2025oralarXiv:2505.16602
3
citations
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman, Noam Rotstein, Roy Ganz et al.
CVPR 2025posterarXiv:2404.18212
29
citations
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Nikhil Kandpal, Brian Lester, Colin Raffel et al.
NeurIPS 2025posterarXiv:2506.05209
10
citations