Home / Events / Cambridge-Turing sessions reloaded: collaborative data science and AI research

Cambridge-Turing sessions reloaded: collaborative data science and AI research

Thursday, 21 October 2021, 10.00am to 12.15pm
Cambridge-Turing Sessions

You are invited to the second series of Cambridge-Turing sessions reloaded, hosted by Cambridge Centre for Data-Driven Discovery (C2D3). This series of online sessions reflects the strengths and collaborations of data science and AI research across the University and many of the presentations will showcase the University’s partnership with The Alan Turing Institute. Our speakers will cover a wide range of themes and disciplines including predicting personalities, mental health, personalised healthcare, and data science for science and humanities.

We invite participants from academia, industry, government, third sector and anywhere in between. C2D3 are also keen to find collaborators and make connections in the East of England region.


To attend the event, please follow the registration link below. Attendance is free of charge. Please register in advance so there is time for you to be sent the event link.

Register here


We are delighted to have Paul Kirk as our event Chair. Paul is a group leader (Programme Leader Track) within the Precision Medicine and Inference for Complex Outcomes (PREM) theme at the MRC Biostatistics Unit.

10:00-10:05 Opening and Session 1 Presenting Turing Research Projects

  • 10:05: AI-guided solutions for early detection of dementia - Professor Zoe Kourtzi (Turing University Lead for the University of Cambridge)
  • 10:15: Q&A 
  • 10:25: The cooked and the raw; extracting and exploiting structured and unstructured clinical data from patient electronic health records - Dr Paul Schofield (Department of Physiology, Development and Neuroscience)
  • 10:45: Q&A 

10:55- 12:10  Session 2: Research showcase

  • 10:55: Session introduction
  • 11:00: Data driven built environment design for decoupling energy and health burdens in poverty - Dr Ronita Bardhan (Department of Architecture)
  • 11:20: Modelling the Impact of Climate Change on UK Agriculture - Dr Sebastian Ahnert (Department of Chemical Engineering and Biotechnology at Cambridge, and also seconded to the Turing as Senior Research Fellow in the Data Science for Science programme)
  • 11:40: Group Q&A

12:10 - 12:15 Closing

The above times are UK BST


Prof. Zoe Kourtzi

Head of the Adaptive Brain Lab; Turing Fellow; Fellow of Downing.

Title: AI-guided solutions for early detection of dementia

Alzheimer’s disease (AD) is characterised by a dynamic process of neurocognitive changes from normal cognition to mild cognitive impairment (MCI) and progression to dementia. However, not all individuals with MCI develop dementia. Predicting whether individuals with MCI will decline (i.e. progressive MCI) or remain stable (i.e. stable MCI) is impeded by patient heterogeneity due to comorbidities that may lead to MCI diagnosis without progression to AD. Despite the importance of early diagnosis of AD for prognosis and personalised interventions, we still lack robust tools for predicting individual progression to dementia. Here, we propose a novel trajectory modelling approach based on metric learning that mines multimodal data from MCI patients to derive individualised prognostic scores of cognitive decline due to AD. Our approach affords the generation of a predictive and interpretable marker of individual variability in progression to dementia due to AD based on cognitive data alone. Including non-invasively measured biological data (grey matter density, APOE 4) enhances predictive power and clinical relevance. Our trajectory modelling approach has strong potential to facilitate effective stratification of individuals based on prognostic disease trajectories, reducing MCI patient misclassification with important implications for clinical practice and discovery of personalised interventions.

Dr Paul Schofield

Reader in Biomedical Informatics; Department of Physiology, Development and Neuroscience

Title: The cooked and the raw; extracting and exploiting structured and unstructured clinical data from patient electronic health records

Electronic health records (EHRs) contain information critical to the realisation of the promise of personalised medicine, but also data essential for the discovery of the molecular basis of disease. Clinical information systems and EHRs were not developed for the discovery, integration and export of information, most being based on the concept of paper records going back to the 1990s. Consequently we find in EHRs information contained in administrative, diagnostic  and procedure codes, which are highly structured and standardised ( pre-cooked) , the results of investigative tests, ranging from blood chemistry to images, which might be regarded as partially structured information (lukewarm?), and finally narrative reports of clinical encounters and discharge letters which are rich sources of information but completely unstructured – raw data. Reliably extracting and integrating these types of information is a huge challenge, but the ability to retrieve coded and quantitative data into a common symbolic framework opens up the possibility of connecting these data together with the large amounts of background knowledge now available, to begin to make semantic sense of our whole ‘menu’.

I will discuss three approaches to extracting and using EHR information: the first uses the Komenti platform which is designed to extract information from free text into semantically formalised ontological annotations, the second is an approach to combine quantitative data into that same semantic framework. The third, a new resource, axiomatises ICD-10 terms uses the Human phenotype ontology for integration with existing knowledge and, for example, patient classification. The promise of these multi-pronged approaches will be discussed.

Dr Ronita Bardhan 

Assistant Professor of Sustainability in the Built Environment; Director, MPhil in Architecture and Urban Studies;Fellow of Architecture at Selwyn College.

Title: Data driven built environment design for decoupling energy and health burdens in poverty 

The built environment is a significant modifiable factor that implicates health and energy decisions. Yet how and to what extent does built environment design parameters affects the quality of life remains unknown. The impacts of a dysfunctional space design are most aggravated in poorer communities where the asymmetries are profound. This talk scientifically unfolds how various data streams : (i)quantitative data from environmental/energy sensors, (i)qualitative data on agency and use of space, and (iii) big data on performance metrics like energy consumption can enable understanding the effects of building design parameters quantifiable outcomes. It advances the innovative paradigm of data-driven design to decouple health and energy burdens from poverty. Using novel datasets from Mumbai, India, the talk demonstrates how design can help understand health metrics like walkability in cities, outdoor heat stress due to climate change and indoor environmental quality in slum transitional housing. One of the challenges of working in resource constraint communities is the absence of data. This talk discusses how novel datasets like films, social dialogues and collective intelligence can be used in data-driven design for a sustainable and healthy future.  

Dr Sebastian Ahnert

Department of Chemical Engineering and Biotechnology at Cambridge, and also seconded to the Turing as Senior Research Fellow in the Data Science for Science programme

Title: Modelling the Impact of Climate Change on UK Agriculture

Climate change is likely have a large impact on UK agriculture, and the viability of current crops in particular. This project aims to integrate data and models of plant development, plant pathology, crop yields, and climate science to form an integrated national crop modelling framework for the UK. This could allow us to predict the impact of climate change on UK agriculture and food security over the coming 50 years. The project brings The Alan Turing Institute, Rothamsted Research, John Innes Centre, and the University of Exeter together to address this challenge in close collaboration. Particular areas of focus are the integration of climate-disease interdependence and the genetics of temperature response into existing crop models, and an attempt to build large-scale machine learning models of crop growth based on satellite, weather, and soil data, with precision crop yield data as a training data set.



Thank you to our sponsors for their sponsorship towards this event.

The Isaac Newton Trust 

Cambridge University Press & Assessment

Our sponsor Cambridge University Press & Assessment publishes three open access, peer-reviewed titles that explore the impact of data science: Data & Policy; Data-Centric Engineering; and Environmental Data Science. You can read more about each title – and some associated webinars and books – on this Data Science hub page at CUP. Authors affiliated with Cambridge University, like those at many other institutions, can publish if accepted on an open access basis in these journals with no article processing charge, courtesy of an overarching Read & Publish agreement. Contact if you would like to find out more.

Find out more about @CambridgeUP’s #OpenAccess data science titles @data_and_policy @dce_journal and @envdatascience and some related webinars and books here:

cupa large banner

Social Media

We will be using #CamTuringSessions on our social media.

Forthcoming talks

Achieving Consistent Low Latency for Wireless Real-Time Communications with the Shortest Control Loop

Thursday, 18 August 2022, 4.00pm to 5.00pm
Speaker: Zili Meng, Tsinghua Unversity
Venue: FW11 and

Real-time communication (RTC) applications like video conferencing or cloud gaming require consistent low latency to provide a seamless interactive experience. However, wireless networks including WiFi and cellular, albeit providing a satisfactory median latency, drastically degrade at the tail due to frequent and substantial wireless bandwidth fluctuations. We observe that the control loop for the sending rate of RTC applications is inflated when congestion happens at the wireless access point (AP), resulting in untimely rate adaption to wireless dynamics. Existing solutions, however, suffer from the inflated control loop and fail to quickly adapt to bandwidth fluctuations. In this paper, we propose Zhuge, a pure wireless AP based solution that reduces the control loop of RTC applications by separating congestion feedback from congested queues. We design a Fortune Teller to precisely estimate per-packet wireless latency upon its arrival at the wireless AP. To make Zhuge deployable at scale, we also design a Feedback Updater that translates the estimated latency to comprehensible feedback messages for various protocols and immediately delivers them back to senders for rate adaption. Trace-driven and real-world evaluation shows that Zhuge reduces the ratio of large tail latency and RTC performance degradation by 17% to 95%.

Speaker Bio: Zili is a 3rd-year PhD student in Tsinghua University. His current research interest focuses on real-time video communications. He has published several papers in SIGCOMM / NSDI and received the Microsoft Research Asia PhD Fellowship, Gold Medal of SIGCOMM 2018 Student Research Competition, and two best paper awards.

BSU Seminar: "Genome-wide genetic models for association, heritability analyses and prediction"

Monday, 22 August 2022, 4.30pm to 5.30pm
Speaker: David Balding, Honorary Professor of Statistical Genetics at UCL Genetics Institute and University of Melbourne
Venue: Seminar Rooms 1 & 2, School of Clinical Medicine, Hills Road, Cambridge CB2 0SP

Although simultaneous analysis of genome-wide SNPs has been popular for over a decade, the problems posed by more SNPs than study participants (more parameters than data points), and correlations among the SNPs, have not been adequately overcome so that almost all published genome-wide analyses are suboptimal. While there has been much attention paid to the shape of prior distributions for SNP effect sizes, we argue that this attention is misplaced. We focus on what we call the "heritability model": a low-dimensional model for the expected heritability at each SNP, which is key to both individual-data and summary-statistic analyses. The 1-df uniform heritability model has been implicitly adopted in a wide range of analyses. Replacing it with better heritability models, using predictors based on allele frequency, linkage disequilibrium and functional annotations, leads to substantial improvements in estimates of heritability and selection parameters over traits, and over genome regions, as well as improvements in gene-based association testing and prediction. Key collaborators Doug Speed, Aarhus, Denmark and Melbourne PhD student Anubhav Kaphle.

Statistics Clinic Summer 2022 III

Wednesday, 31 August 2022, 5.30pm to 7.00pm
Speaker: Speaker to be confirmed
Venue: Venue to be confirmed

If you would like to participate, please fill in the following "form": The deadline for signing up for a session is 12pm on Monday the 29th of August. Subject to availability of members of the Statistics Clinic team, we will confirm your in-person or remote appointment.

This event is open only to members of the University of Cambridge (and affiliated institutes). Please be aware that we are unable to offer consultations outside clinic hours.

Statistics Clinic Summer 2022 IV

Wednesday, 21 September 2022, 5.30pm to 7.00pm
Speaker: Speaker to be confirmed
Venue: Venue to be confirmed

Abstract not available

Title to be confirmed

Monday, 26 September 2022, 3.00pm to 4.00pm
Speaker: Christopher Yau, University of Manchester
Venue: CRUK CI Lecture Theatre

Abstract not available