Accelerating Early Drug Discovery with Open Protein-Small Molecule Binding Data
The SGC is embarking on a transformative phase to revolutionize drug hit discovery by enabling a shift from a largely lengthy experimental process to a fast-paced, data-driven, computational science. Our plan for the next five years is to create high-quality, openly accessible protein-ligand datasets, compatible with machine learning applications. By setting community challenges and focusing initially on hit-finding and hit optimization, we will benchmark machine learning approaches, with SGC hubs providing experimental testing of the predictions.
Why Generating Data is Needed
Transforming drug discovery into a computational process using ML/AI has the potential to significantly accelerate the identification of new therapeutic agents. However, a major hurdle is the lack of comprehensive, high-quality data to train sophisticated computational models. This shortage of data, combined with the need for greater data accessibility and empirical testing, severely limits AI's ability to make accurate predictions and accelerate drug discovery. SGC's open data strategy aims to bridge this gap by generating extensive, high-quality datasets that enable AI models and make small molecule drug discovery more accessible to all.