Using AI to understand how viruses work
Researcher spotlight: Katy Brown, Department of Pathology,
C2D3 Early Career Researcher Seed fund awardee, 2024
Cambridge researcher, Katy Brown, works in virus discovery - looking for novel or unexpected viruses. Once a virus has been identified, we want to understand as much as possible about how it replicates and causes disease. Proteins are key to these processes so understanding how viruses produce proteins could lead to new vaccines or treatment options.
Many high-profile human viruses known for triggering major outbreaks, including coronaviruses and the Zika virus, belong to a group known as “positive-strand RNA viruses”. Some of the viruses in this group use a particular method to produce proteins involving making short molecules of RNA (ribonucleic acid) that carry instructions for specific proteins. To determine whether a virus uses this method we need to identify these molecules, but the lab experiments needed to do this can be expensive and time-consuming. Instead, Katy proposed using AI classification methods to automatically detect them in datasets containing information about the RNA molecules present in a sample – known as RNA sequencing datasets. Almost five million datasets of this type are publicly available from a diverse range of organisms, from humans to insects. Although usually collected for other purposes, many of these datasets contain infecting viruses, providing a vast amount of potential data to study.
Katy successfully applied for funding from the C2D3 Early Career Researcher Seed Fund to host Purav Gupta, an undergraduate student from the University of Toronto, during the summer of 2024. Together they developed a tool to automatically download, process and filter datasets from online repositories and an algorithm to detect these short RNA molecules. Starting with data from cells deliberately infected with one of four common human viruses, they were able to accurately determine whether a virus uses this method to produce proteins in 89% of tested cases.
As well as kick-starting this important piece of research, the funding enabled Katy to build skills in project and grant management and gain experience as a primary supervisor. Katy highlighted that hosting a summer student is a great way to build a collaboration with another research group and passed on an important lesson learned.... it takes longer than you think to organise the logistics of hosting a visiting student – start early!
Purav said the following about his experience: “It’s been an incredible summer working at the Firth Lab, a computational virology lab, where I developed an automated pipeline to analyze transcriptomic data for detecting viral subgenomic RNAs. This project pushed me in all the best ways, allowing me to refine my skills in bioinformatics and computational biology while contributing to meaningful research in virology."
What’s next?
Following Purav’s visit, they are continuing to refine their scripts and test them on a wider variety of datasets moving from cells deliberately infected with known viruses to “wild” samples collected for different purposes. Interestingly, data from mosquitoes rather than humans can be used to investigate certain human viruses…. a bonus as mosquito data comes with far fewer privacy concerns!
They are now developing these scripts into an open-source tool that others can use. Looking ahead, they hope it will help researchers quickly analyse newly discovered viruses, rapidly increasing our understanding of how they operate and helping us to prepare for future viral outbreaks.