As the Structural Genomics Consortium (SGC) enters its third decade, it is embarking on an ambitious goal: to become the world’s largest generator of open-source protein-ligand data. Recognizing the growing importance of data science in early-stage drug discovery, SGC is launching a new initiative aimed at training the next generation of computational and data scientists. Supported by a grant from the University of Toronto’s Data Science Institute, the program will provide trainees with the skills necessary to interpret complex experimental data and drive breakthroughs in drug discovery.
At the core of the program is a collaborative framework that bridges the gap between data scientists and experimental scientists; two groups that often work in isolation. "Our experience with machine learning teams in the pharmaceutical industry has shown that understanding how experimental data is generated is crucial for developing effective machine learning strategies," said Matthieu Schapira, SGC principal investigator at the University of Toronto, who is leading the program. "Similarly, it’s essential for bench scientists to understand how data scientists approach the analysis and interpretation of experimental data."
The program is structured around four key objectives:
1. Recruit: Build a network of data science and biophysics experts at the University of Toronto to mentor and support trainees.
2. Train: Offer boot camps and workshops led by domain experts to teach trainees how experimental data is generated, curated, and exploited to train models for drug candidate prediction.
3. Challenge: Assign trainees a dataset with blinded hits, challenging them to develop and apply novel variations of methods learned during training.
4. Collaborate: Trainees who succeed in the challenge will predict prospective hits, which biophysics trainees will experimentally test and their findings will be published.
Alongside Schapira, Mohamad Moosavi, Rachel Harding, and Christopher Maddison will seek to foster collaboration between data scientists and experimentalists while advancing the use of open science data in drug discovery.
The first 9-week bootcamp open to students, postdoctoral researchers and staff with computer or biological science backgrounds will take place from February to April 2025. Interested individuals are encouraged to submit their applications by January 12, 2025, in order to secure their spot with complimentary registration.
“This program will empower Canadian trainees to make significant, real-world contributions to early-stage drug discovery using cutting-edge data science techniques,” Schapira added. “While this pilot project is limited to the University of Toronto, the goal is to expand it nationwide, to build a vibrant community of data and experimental scientists working together to advance drug discovery.”
Read more: https://datasciences.utoronto.ca/early-stage-drug-discovery/