ProteinConformers: Benchmark Dataset for Simulating Protein Conformational Landscape Diversity and Plausibility

0citations
Project
0
Citations
#2081
in NeurIPS 2025
of 5858 papers
7
Authors
4
Data Points

Abstract

Understanding the conformational landscape of proteins is essential for elucidating protein function and facilitating drug design. However, existing protein conformation benchmarks fail to capture the full energy landscape, limiting their ability to evaluate the diversity and physical plausibility of AI-generated structures. We introduce ProteinConformers, a large-scale benchmark dataset comprising over 381,000 physically realistic conformations for 87 CASP targets. These were derived from more than 40,000 structural decoys via extensive all-atom molecular dynamics simulations totaling over 6 million CPU hours. Using this dataset, we propose novel metrics to evaluate conformational diversity and plausibility, and systematically benchmark six protein conformation generative models. Our results highlight that leveraging large-scale protein sequence data can enhance a model’s ability to explore conformational space, potentially reducing reliance on MD-derived data. Additionally, we find that PDB and MD datasets influence model performance differently, current models perform well on inter-atomic distance prediction but struggle with inter-residue orientation generation. Overall, our dataset, evaluation metrics, and benchmarking results provide the first comprehensive foundation for assessing generative models in protein conformational modeling. Dataset and instructions are available at https://huggingface.co/ datasets/Jim990908/ProteinConformers/tree/main. Codes are stored at https://github.com/auroua/ProteinConformers. An interactive website locates at https://zhanggroup.org/ProteinConformers.

Citation History

Jan 26, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0
Feb 2, 2026
0