Last year, a revolution took place in the field of protein structural biology. The revolution was driven by the AlphaFold2 algorithm developed by DeepMind and has provided the world with accurate, high-resolution structures of just about all human proteins. This provides an opportunity to apply this data to better understand the functional effects of human genetic mutation. With computational tools, we can use these predicted structures to understand if, say, a mutation causes instability in a particular protein. This matters because we don’t yet understand the importance of most of the genetic variation in an individual human genome. And there is lots of this. This project will use this corpus of structural predictions with an ecosystem of computational tools to predict the structural importance of mutations, on a genome-wide scale. We will compute these at scale and then look for structure in this data, especially if it relates to disease incidence. Challenges will be the computation of outcomes of mutation on protein structures with algorithms that run in both an HPC environment to those that will need to be screen-scraped from a multitude of web-tools. As the resultant data from these sources grows, there will be opportunities for identification of patterns and structure in these data, using machine learning and deep learning methodology.
· Develop computationally-derived feature sets from predicted protein structures (stability and protein-protein interaction changes)
· Compute features at a whole genome/proteome level
· Integrate features with other databases of variant information to develop ensemble predictive tools of mutation effect
Python or R programming and experience in data science and/or machine learning is required. Experience with or interest in biological datasets and biological questions is essential.
Potentially, both an Honours and a PhD scholarship are available for exceptional candidates.
In the first instance, please make contact with Dan.Andrews@anu.edu.au to discuss scope and potential for developing a project tailored to your interests and intended trajectory.
· Experience with real scientific data and the challenges involved in deriving meta-data and ensuring data consistency
· Practical experience with computational pipelines, potentially in a high-performance computing environment
· Develop an inter-disciplinary scientific skillset