Menu

Home / Opportunities / The Alan Turing Institute - Data Study Group

Warning message

Cambridge-based members of C2D3 can log in to view more information about this opportunity.

The Alan Turing Institute - Data Study Group

Closing date: 
Monday, 27 July 2020

The Alan Turing Institute are now accepting applications for our September 2020 Data Study Group. Bringing together top talent from data science, artificial intelligence, and wider fields, to analyse real-world challenges. The organisations and challenges leading the Data Study Group this September are:

  • CRUK Cambridge Institute - Modelling interactions driving breast cancer development
  • Greenvest Solutions - Forecasting wind energy production using satellite data
  • catsAi - Communicating high-street bakery sales predictions using counterfactual explanations
  • University of Strathclyde and Supergen Energy Networks Hub - Using machine learning to predict the onset of blackouts

 

Full information and application process

 

Organisations leading the challenges

CRUK Cambridge Institute - Modelling interactions driving breast cancer development

The Cancer Research UK (CRUK) Cambridge Institute is making available a comprehensive dataset of gene expression in 400 Estrogen Receptor (ER) positive breast cancer cell line samples, this includes control experiments and perturbations in the form of gene knockdowns. In the cell lines considered ER is the main driver of breast cancer. Participants are invited to explore the role of perturbation targets –using AI and machine learning tools– in the development of breast cancer, this work will allow devising of new interventions to halt this process. No experience working with biological data is required.

Useful skills: Primarily novel network methods to analyse the data. No-one is expected (or likely) to have experience of all of the following, but skills may include: Network inference, neural networks, Bayesian networks, graph-based models, causal models, correlation/regression-based networks, Boolean Implication Networks, Nested Effect Models, Linear Effect Models, techniques to validate model robustness – e.g. boot-strapping, data visualisation.

This work is supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC Grant EP/T001569/1, particularly the “Data Science for Science” theme within that grant and The Alan Turing Institute.

 

Greenvest Solutions - Forecasting wind energy production using satellite data

Greenvest is on a mission to accelerate renewable energy adoption worldwide. The start-up provides strategic technology to plan, monitor, and assess clean energy projects globally.

When deciding where to build a new wind farm, a company should assess the wind resources of an area with the minimum uncertainty possible. This is usually done in two steps, firstly, a preliminary assessment is carried out utilizing low-resolution wind maps which are derived from ground stations. Secondly, at least one met(-eorological) mast is installed at the most promising locations to record at least one year of current wind data.

This approach presents several problems. One one hand, even if the wind maps encompass a large temporal frame, they are often inaccurate and poorly interpolated to the desired location. On the other hand, met masts offer precise measurements of wind but they are expensive to be set up and may produce an inaccurate prediction for the long term production of the wind farm due to the year-by-year variability of wind resources.

A newer approach to this problem is to use mesoscale models and computational fluid dynamics to interpolate satellite-derived data but this is a computationally expensive method that often underperforms for complex terrains and coarse grids of satellite data. A promising solution is to use Machine Learning to predict wind resources at a certain location starting from satellite measured data.

Thus, the aim of this challenge is to forecast wind resources close to the ground for a specific location. The key difficulty is to understand how terrain data and surface roughness could be used to train a model that interpolates the satellite data to the desired location.

This could be achieved by combining ML, big data, and geospatial expertise to predict the amount of wind resource. To reach this goal, we will attempt to leverage time-series of wind data, atmospheric as well as digital surface models and surface roughness, to make an overall prediction.

Useful skills: Time series, functional data analysis, satellite images, machine learning, neural networks.

 

catsAi - Communicating high-street bakery sales predictions using counterfactual explanations

“Communicating high-street bakery sales predictions using counterfactual explanations”

Many business decisions start with a simple question; “How many will I sell?”. Sales are influenced by many factors, including location, product-type and weather. catsAi will provide access to a comprehensive dataset of historical sales and meteorological data across 1000s of bakery sites. Participants are invited to investigate whether data science and AI can identify factors influencing sales which are poorly defined or as yet undiscovered, and how counterfactual explanations can be applied to promote adoption and trust in these predictions?

Useful skills: programming skills, machine learning, spatio-temporal analysis.

 

University of Strathclyde and Supergen Energy Networks Hub - Using machine learning to predict the onset of blackouts

Electrical power systems are highly non-linear, dynamical and complex systems, making the investigation of their dynamic behaviour very challenging, especially under increasingly uncertain operation introduced by renewable energy sources on our way to tackling climate change. One of the core mechanisms that can lead to blackouts is the sequential disconnection of power system components, commonly referred to as cascading failures. In this challenge we are interested in investigating the potential of machine learning to predict such events early on at their onset, using a provided dataset of detailed simulated time domain responses.

Useful Skills: Experience with relevant machine learning methods to deal with time domain data, Computer science/machine learning background, Enthusiasm & Python.

About us

The Cambridge Centre for Data-Driven Discovery (C2D3) brings together researchers and expertise from across the academic departments and industry to drive research into the analysis, understanding and use of data science and AI. C2D3 is an Interdisciplinary Research Centre at the University of Cambridge.

  • Supports and connects the growing data science and AI research community 
  • Builds research capacity in data science and AI to tackle complex issues 
  • Drives new research challenges through collaborative research projects 
  • Promotes and provides opportunities for knowledge transfer 
  • Identifies and provides training courses for students, academics, industry and the third sector 
  • Serves as a gateway for external organisations 

Join us