Cambridge-Turing sessions: collaborative data science and AI research

C2D3 event

Wed, 23 Jun 2021 1:00 PM - 5:00 PM

You are invited to the Cambridge-Turing sessions, hosted by Cambridge Centre for Data-Driven Discovery (C2D3) and supported by The Alan Turing Institute. This series of online sessions reflects the strengths and collaborations of data science and AI research across the University and many of the presentations will showcase the University’s partnership with The Alan Turing Institute. Our speakers will cover a wide range of themes and disciplines including the humanities, natural language processing for mental health, digital twins, weather and climate, and healthcare.

We invite participants from academia, industry, government, third sector and anywhere in between. C2D3 are also keen to find collaborators and make connections in the East of England region.

Registration

To attend the event, please follow the registration link below. Attendance is free of charge. Please register in advance so there is time for you to be sent the event link.

Register here

Programme

13:00-13:20 Session 1: Introducing The Alan Turing Institute, chaired by Professor Zoe Kourtzi (Turing University Lead for the University of Cambridge)

13:00: Introduction to the Turing, Turing for universities, Turing for industry and other organisations by Daniel Lovelock (Senior Academic Engagement Manager, The Alan Turing Institute) and Katrina Payne (Partnerships Development Lead, The Alan Turing Institute).

13:20-14:25 Session 2: Presenting the Turing Fellow research projects, chaired by Professor Zoe Kourtzi (Turing University Lead for the University of Cambridge)

13:20: Session introduction
13:25 Data science and AI in the humanities: Professor Robert Foley, Dr Jason Gellis and Dr Camila Rangel Smith Data science and the reconstruction of past behaviour: capturing the stone tool technology of prehistoric people
13:45: Speaker Q&A
13:55 Data science and AI for mental health: Dr Sarah Morgan Assessing psychosis risk using quantitative markers of transcribed speech
14:15: Speaker Q&A

14:25-14:40: Break

14:40-16:30 Session 3: Research showcase, chaired by Professor Gos Micklem (Department of Genetics and Department of Applied Mathematics and Theoretical Physics, University of Cambridge)

14:40 Data-centric engineering: Rebecca Ward Growing Underground: towards a digital twin for crop production
15:00 Data science and the environment: Rachel Furner Developing data-driven forecast systems
15:20 Data science and AI for healthcare: Adam Berman Automated approaches to diagnosing Barrett's Oesophagus from Cytosponge
15:40 Data science for history and social sciences: Dr Alexis Litvine Extracting structured data from historical insurance records for Aviva. An application of a new tool (THOTH) to extract tabular data at scale
16:00 Foundations of data science: Professor Carola-Bibiane Schönlieb Looking into the black box: how mathematics can help to turn deep learning inside out
16:20: Closing remarks and opening of the networking

16:30-17:00 Session 4: Get connected

Interactive networking and discussions
Ask a Turing Fellow
Connect to a Cambridge academic and researcher
Q&As with speakers and session chairs

The above times are UK BST

Abstracts

Prof. Robert Foley ⁽¹⁾, Dr Jason Gellis ⁽²⁾, Dr Camila Rangel Smith ⁽³⁾

⁽¹⁾ Leverhulme Centre for Human Evolutionary Studies, University of Cambridge; Interdisciplinary Centre for Archaeology and Evolution of Human Behaviour, University of Algarve, Faro, Portugal; Turing Fellow. ⁽²⁾ Leverhulme Centre for Human Evolutionary Studies, University of Cambridge. ⁽³⁾ The Alan Turing Institute.

Title: Data science and the reconstruction of past behaviour: capturing the stone tool technology of prehistoric people

For most of human history, stone was the primary raw material for much of the technological basis for human adaptation. The flaking of stone to create sharp edges and particular shapes and sizes of tools represents one of our major evolutionary advances. Once the skill was acquired, stone tools were made and discarded in prolific quantities, and changed in ways that mapped developments in cognition, behaviour and ecology. Archaeologists and anthropologists have, over more than one hundred years, collected vast numbers of stone tools, and developed intensive methods of analysis. The result is that there is a major resource in archived photographs and drawings of lithics. The Turing funded project, PALAEOANALYTICS, aims to develop AI/machine learning approaches to automate the retrieval of this information and to expand the potential data collected. In this talk we will present the progress we have made in developing computer vision techniques to collect key morphometric information from drawings of stone tools, focusing on those that indicate the technological processes used by prehistoric people to produce them.

Dr Sarah Morgan

Accelerate Science Research Fellow, Department of Computer Science and Technology, University of Cambridge; Senior Research Associate, Cambridge Brain Mapping Unit, Department of Psychiatry, University of Cambridge; Turing Fellow.

Title: Assessing psychosis risk using quantitative markers of transcribed speech

There is a pressing clinical demand for tools to predict individual patients' disease trajectories for schizophrenia and other conditions involving psychosis, however to date such tools have proved elusive. Behaviourally and cognitively, psychosis expresses itself by subtle alterations in language. Recent work has suggested that Natural Language Processing markers of transcribed speech might be powerful predictors of later psychosis (Mota et al 2017, Corcoran et al 2018), for example, Corcoran et al 2018 used quantitative markers of semantic coherence collected at baseline from individuals at clinical high risk for psychosis, to predict transition to psychosis with 79% accuracy.

However, it remains unclear which NLP measures are most likely to be predictive, how different NLP measures relate to each other and how best to collect speech data from patients. In this talk, I will discuss our research tackling these questions, as well as the wider challenges of translating this type of approach to the clinic. Ultimately, computational markers of speech have the potential to transform healthcare of mental health conditions such as schizophrenia, since they are relatively easy to collect and could be measured longitudinally to quickly identify changes in patients’ disease trajectories.

Rebecca Ward

Postdoctoral Research Associate, The Alan Turing Institute, with Dr Ruchi Choudhary (University of Cambridge), Data-centric engineering group (ASG)

Title: Growing Underground: towards a digital twin for crop production

The growth of building-integrated agriculture as one potential solution to reducing food miles for city-dwellers brings with it disparate problems that present new challenges to the agriculture industry. The dual aim of keeping energy costs to a minimum while maximising crop growth is particularly challenging. Extensive monitoring provides valuable information and statistical analysis of historic conditions can be used to generate forecasting models. Physics-based simulation can also help; by simulating the interaction of the vegetation with the surrounding environment, ‘what-if’ scenario tests can be performed efficiently and the impact of potentially sub-optimal conditions explored. We are fortunate to work closely with the operators of Growing Underground, a farm located in previously disused tunnels in Clapham, London. As part of a long-running project encompassing both monitoring and simulation, a digital twin is being developed which will enable the operator to both visualise current and forecast farm conditions and to explore future scenarios. In this talk the development and future plans for the digital twin will be described and lessons learnt for wider application of digital twin technology will be discussed.

Rachel Furner

PhD student at the British Antarctic Survey, and Department of Applied Mathematics and Theoretical Physics, University of Cambridge.

Title: Developing data-driven forecast systems

The recent boom in machine learning and data science has led to a number of new opportunities in the environmental sciences. In particular, process-based weather and climate models (simulators) represent the best tools we have to predict, understand and potentially mitigate the impacts of climate change and extreme weather. However these models are incredibly complex and require huge amounts of HPC resources. Machine learning offers opportunities to greatly improve the computational efficiency of these models.

Here I discuss recent work to develop a data-driven model of the ocean, an integral part of the weather and climate system. We train a neural network on the output from an expensive process-based simulator of an idealised channel configuration of oceanic flow. We show the model is able to learn well the complex dynamics of the system, replicating the mean flow and details within the flow over single prediction steps. We also see that when iterating the model, predictions remain stable, and continue to match the ‘truth’ over a short-term forecast period, here around a week.

Adam Berman

Phd Student at Cancer Research UK Cambridge Institute, University of Cambridge

Title: Automated approaches to diagnosing Barrett's Oesophagus from Cytosponge

Deep learning methods have been shown to achieve excellent performance on diagnostic tasks, but how to optimally combine them with expert knowledge and existing clinical decision pathways is still an open challenge. This question is particularly important for the early detection of cancer, where high-volume workflows may benefit from (semi-)automated analysis. Here we present a deep learning framework to analyze samples of the Cytosponge, showing that learning methods can perform quality control, diagnosis, and atypia detection with high accuracy.

Dr Alexis Litvine

Faculty of History and Cambridge Digital Humanities

Title: Extracting structured data from historical insurance records for Aviva. An application of a new tool (THOTH) to extract tabular data at scale

Handwritten Text Recognition technology (HTR) has recently become a viable method for transcribing handwritten historical documents. This has made archival documents ‘machine-readable’ for the first time. HTR is now successfully used by thousands of amateur and professional historians, archivists, genealogists, librarians, and social scientists around the world. However, it is not yet suitable for sources with complex layout/tabular structures. This key limitation prevents social scientists and beneficiaries from applying HTR to documents such as censuses, civil or military records, and tax lists. Our technology (THOTH), developed by Yiannos Stathopoulos (Computer Science), Oliver Dunn (History) and Alexis Litvine (History/CDH) makes this possible by streamlining data extraction from tables.

Archived manuscript tables provide detailed insights into more than 500 years of history, covering nearly every aspect of people's lives, from their birth, education, health, profession, and housing, up to their death and legacy, representing tens of thousands of shelf kilometres of historical documents preserved in European archives alone. Furthermore, large portions of these documents are becoming available as digital images due to systematic digitization efforts. As one shelf kilometre corresponds on average to ten million page-images, most large national archives will soon be hosting several billions of images. THOTH can make a large number of data contained in such documents analysable and searchable both by researchers and the public.

Since 2018, THOTH has combined several proven computer vision technologies into our own AI workflow. What started as a tool for our own data needs soon attracted the interest of non-academic partners, expressing their wish to adopt our technology. We recently benefitted from a £3,000 ESRC-IAA discretionary fund grant to buy equipment to process large scale images and set up Osiris-AI ltd (www.osiris-ai.com). We are currently working with Aviva Plc on the digitisation of a large number of records from the former Hand-in-hand fire insurer held in the London Metropolitan Archive on behalf of Aviva. In order to do this, we will digitize the complete collection of the Hand-in-Hand insurance policy registers held at London Metropolitan Archive and create a structured dataset for use by researchers and the public. These registers are particularly interesting because of their long spanning coverage that provides scholars with lots of useful data, and which serves as an advertisement for Aviva's heritage as a national insurance institution. Containing approximately 1.8M unique observations of London addresses insured against fire for the period 1697-1865, these data will be geolocated to specific locations in London. Partnering with Layers of London (layersoflondon.org), we will then be able to offer a striking visualisation of the data created with THOTH.

Professor Carola-Bibiane Schönlieb

Department of Applied Mathematics and Theoretical Physics; Turing Fellow.

Title: Looking into the black box: how mathematics can help to turn deep learning inside out

Deep learning has had a transformative impact on a wide range of tasks related to Artificial Intelligence, ranging from computer vision and speech recognition to playing games. Still, the inner workings of deep neural networks are far from clear, and designing and training them is seen as almost a black art. In this talk we will try to open this black box a little bit by using mathematical structure of neural networks described by so-called differential equations and mathematical optimisation. The talk is furnished with some examples in image processing and computer vision, ranging from biomedical imaging to remote sensing.

Social media

We will be using #CamTuringSessions on our social media.