Menu

Home / C2D3 Computational Biology

C2D3 Computational Biology

C2D3 Computational Biology logo

We are living in a very exciting time for biology: whole-genome sequencing has opened up the field of genome-scale biology and with this a trend to larger-scale experiments, whether based on DNA sequencing or other technologies such as microscopy.  However it is also a time of great opportunity for small-scale biology as there is a new wealth of data to build from: one can turn to a computer to ask questions that previously might have taken months to answer in the laboratory. One of the great challenges for the field is analysing the large amounts of complex data generated, and synthesising them into useful systems-wide models of biological processes. Whether operating on a large or small scale the use of mathematical and computational methods is becoming an integral part of biological research.

There remains a world-wide shortage of skilled computational biologists. An important part of C2D3 Computational Biology is an MPhil course based at the Centre for Mathematical Sciences. The 11-month course introduces students to bioinformatics and other quantitative aspects of modern biology and medicine. It is intended especially for those whose first degree is in mathematics and computer science and others wishing to learn about the subject in preparation for a PhD course or a career in industry. Complementing the MPhil course is the Wellcome Trust PhD programme in Mathematical Genomics and Medicine.  Run jointly with the Wellcome Trust Sanger Institute this programme provides opportunities for collaborative research across the Cambridge region at the exciting interfaces between mathematics, genomics and medicine.

History and financial support 

C2D3 Computational Biology came about by the merger of the Cambridge Computational Biology Institute (CCBI) into C2D3 in 2021. The CCBI was established in 2003 to promote computational biology, interpreted broadly, within the University and in the region. It established (2004) the MPhil in Computational Biology programme, founded (2011) the Wellcome Trust Mathematical Genomics and Medicine 4-year PhD programme, and, among other activities, started a popular computational biology annual symposium. The CCBI was involved in setting up and helping to run the Cambridge Big Data (CBD) Strategic Research Initiative out of which the C2D3 Interdisciplinary Research Centre was formed. Similarly the CCBI was part of the group that helped set up the Alan Turing Institute.  

The CCBI received financial support equally from the four science schools of the University: 

  • The School of the Biological Sciences      
  • The School of Clinical Medicine      
  • The School of the Physical Sciences (via DAMTP, Physics, Chemistry)      
  • The School of Technology (via Engineering, Computer Science) 

Space was kindly provided by the Department of Applied Mathematics and Theoretical Physics, within the Centre for Mathematical Sciences. 

MPhil in Computational Biology  

The Cambridge-MIT Institute provided funds to establish the MPhil in Computational Biology and subsequently studentships have been provided by: 

  • Biotechnology and Biological Sciences Research Council      
  • Cancer Research UK      
  • Engineering and Physical Sciences Research Council      
  • Medical Research Council      
  • Microsoft Research 

MGM PhD Programme 

The PhD programme in Mathematical Genomics and Medicine is funded by the Wellcome Trust.

Mailing list

To sign-up to the mailing list, with option to join the C2D3 main mailing list, please complete the appropriate form here.

Talks

Quantitative Biology Seminar

Monday, 30 September 2024, 1.00pm to 2.00pm
Speaker: Dr Mohammed Lotfollahi, Wellcome Sanger Institute
Venue: CRUK CI Lecture Theatre

Abstract not available

Protein genetic architecture is simple, and epistasis can facilitate the evolution of new functions

Monday, 7 October 2024, 1.30pm to 2.30pm
Speaker: Assistant Professor Brian Metzger, Purdue University
Venue: CRUK CI Lecture Theatre

A protein’s genetic architecture – the set of causal rules by which its sequence determines its function – also determines the effects of mutations and thus the possible evolutionary routes a protein may take. Prior work suggests that the genetic architectures of proteins are complex, with large amounts of high order epistasis that constrains evolution and limits the ability to predict protein function from sequence. However, prior work may have overstated both the extent and impact of epistasis by analyzing genetic architecture from the perspective of a single reference genotype and failing to fully account for global nonlinearities – both of which can artificially inflate estimates of epistasis – and by considering only a single protein function and direct evolutionary paths between pairs of proteins – both of which make epistasis a constraint on evolution. Here I will describe a reference-free method for inferring protein genetic architecture from combinatorial deep mutational scanning datasets that accounts for global nonlinearities. Applying this approach to 20 previously collected datasets reveals that the genetic architecture of most proteins studied to date are simple: main and pairwise interactions among amino acids, along with a simple nonlinear correction, explains a median of 96% of phenotypic variance (>92% in every case). We further used this approach to dissect the genetic architecture and evolution of a transcription factor’s specificity for DNA by combining deep mutational scanning with ancestral protein reconstruction. As before, the genetic architecture was simple, with few high order interactions and many main and pairwise interactions instead. However, these pairwise interactions massively expanded the number of opportunities for single-residue mutations to switch specificity from one DNA element to another. By bringing variants with different specificities close together in sequence space, pairwise epistatic interactions can thus facilitate the evolution of new molecular functions. By reorienting how we estimate epistasis, reference-free analyses can reveal simple and intelligible protein genetic architectures and thus provide an experimentally and analytically tractable route forward for understanding protein genetic architecture and its evolution.

Protein Evolution in Sequence Landscapes - From Data to Models and Back

Monday, 25 November 2024, 12.30pm to 1.30pm
Speaker: Professor Martin Weigt, Institute of Biology, Paris
Venue: CRUK CI Lecture Theatre

In the course of evolution, proteins diversify their sequences via a complex interplay between random mutations and neutral selection. As a consequence, we can today observe protein sequences of common evolutionary origin, with almost identical three-dimensional folds and biological functions, which however differ by as much as 70-80% of their amino acids. In my presentation, I will review our efforts to model protein evolution across multiple timescales, from the emergence of single mutations in a protein up to deep evolutionary time scales. To this aim, we first model protein fitness landscapes via generative probabilistic models trained on genomic data, and we show that these models are able to predict the effect of individual mutations, and to generate non-natural but biologically functional proteins. Second, we describe evolution as a stochastic process in these landscapes. The proposed framework accurately reproduces the sequence statistics of both short-time (experimental) and long-time (natural) protein evolution, suggesting applicability also to relatively data-poor intermediate evolutionary time scales, which are currently inaccessible to evolution experiments. Our model uncovers a highly collective nature of epistasis, gradually changing the fitness effect of mutations in a diverging sequence context, rather than acting via strong interactions between individual mutations. This collective nature triggers the emergence of a long evolutionary time scale, separating fast mutational processes inside a given sequence context, from the slow evolution of the context itself.

Intrinsic Disorder Promotes Protein Refoldability and Enables Retrieval from Biomolecular Condensates

Wednesday, 11 December 2024, 11.00am to 12.00pm
Speaker: Stephen D Fried, John Hopkins University
Venue: CRUK CI Lecture Theatre

Abstract not available

About us

The Cambridge Centre for Data-Driven Discovery (C2D3) brings together researchers and expertise from across the academic departments and industry to drive research into the analysis, understanding and use of data science and AI. C2D3 is an Interdisciplinary Research Centre at the University of Cambridge.

  • Supports and connects the growing data science and AI research community 
  • Builds research capacity in data science and AI to tackle complex issues 
  • Drives new research challenges through collaborative research projects 
  • Promotes and provides opportunities for knowledge transfer 
  • Identifies and provides training courses for students, academics, industry and the third sector 
  • Serves as a gateway for external organisations 

Join us