The Research Informatics group is a close collaboration between the SGC and the Kennedy Institute of Rheumatology, allowing sharing of technologies and ideas between the two organisations.
Laboratory Information Management Systems (LIMS)
The SGC is a paperless environment in terms of laboratory and experiment data logging. The RI team at SGC Oxford has pioneered the approach of providing an Electronic Lab Notebook (ELN) for qualitative information such as experimental methods, results (such as gel images e.t.c.) and interpretations within a high throughput human structural biology pipeline. This is complemented by a quantitative LIMS platform known as Scarab which records information about the process at each point of a target's life time from DNA to protein structure. Scarab also provides a sophisticated querying interface allowing one to mine almost any combination of data providing opportunities to assess pinch-points and inefficiencies in our experimental approaches. Research into improving ways of representing and mining such data continues.
Key external technologies: MolSoft Scarab
The RI team is responsible for the maintenance and upkeep of the various SGC project target lists. This involves assignment of target identifiers, assessment of suitable sources of DNA ('entry clones') and day to day maintenance of target status'. The RI team have developed a number of custom approaches to ensuring that the target list is suitably annotated with appropriate information from other databases. Our ScooperSnooper platform checks the PDB and other sources of structural genomics information such as TargetDB to ensure that the SGC does not work on targets/domains that are already or are imminently to be deposited in the PDB.
Target and domain information is maintained in a central internal LIMS (see below). Large scale sequence searches against external databases such as the Mammalian Genome Collection (MGC) are pre-calculated and stored in the database for automatic data retrieval.
Key technologies: BLAST; PERL
Key external databases: RefSeq; PDB; Pfam; SMART
We employ many in silico approaches to enhance our target success rate and also help to interpret the structural findings from novel target structures. Compounds which bind to targets and enhance thier ability to crystalise are prioritised for purchase using a in-house virtual library of compounds that can be searched against using compound scaffolds. Once the structure of a target is known techniques such as docking or Virtual Ligand Screening (VLS) can be used to predict potential small molecule binders which may be precursers to 'chemical probes'. Since the SGC works on protein families we often have a unique opportunity to perform 'intra-family' structural analyses of targets, providing suggestions about how their subtle differences in structure may explain their different functional roles in vivo.
We have developed, in conjunction with GSK and the SABS-IDC doctoral training programme, a number of web-based tools for the analysis of ensembles of protein/ligand structures. Using a 3D Matched-Molecular Pair approach, WONKA provides a novel interface for identifying unusual binding features as well as conserved waters and residue side chain movements. OOMMPPAA extends this analysis to include biochemical/biophysical data, making it possible to easily to take advantage of the breadth of negative data which is often discarded.
We use providing facilities for SGC Oxford scientists to determine what compounds can be purchased from vendors.The RI team also collates and curates the SGC in-house compound collection, assigning unique identifiers (global IDs) to each compound to enable seamless communication both within the SGC and also with collaborators and partners.
Key external technologies: MolSoft MolCart and ICM-Pro; Knime
Key external databases: ChEMBL; PubChem
The RI team develops and maintains the SGC Global web site as well as the SGC Oxford web site. We aim to keep these sites updated in a dynamic manner whilst developing novel ways of providing the SGC's findings to the general scientific public.
Key external technologies:Drupal; Django; Apache Web Server; PHP; MySQL
Dissemination of Data to the Public - iSee
SGC Oxford's RI team, in conjunction with MolSoft LLC, have pioneered a novel method of gathering diverse information regarding the structure solution of a target and making it available in one 'datapack' which is intuitive to use. iSee datapacks contain structural visualisations of key aspects of each solved structure, annotated by experts on those targets. Materials and Methods giving information on how to reproduce our work along with facts relating to the targets themselves are also included. Work is ongoing to further expand upon this platform, making it even easier for non-structural biologist to understand the scientific consequences of our work.
Key external technologies: MolSoft ICM and activeICM
The Research Informatics team is also responsible for the design, implementation and maintenance of all aspects of IT. These include large servers which contain experimental data, email, web sites and LIMS platforms as well as PCs connected to lab machines. We also provide desktop machines for SGC Oxford members and provide day to day support. Data backup and archival is also a critical requirement which we fulfil.
Our team also includes the IT and Informatics group of the Kennedy Institute in their new building on the Old Road Campus.