Home / Events / Cambridge Big Data Research Symposium

Cambridge Big Data Research Symposium

Monday, 26 November 2018, 9.00am to 6.00pm

The Cambridge Big Data Strategic Research Initiative links together several hundred researchers across the University of Cambridge and beyond, to foster interdisciplinary collaborations in the fields of data science, data-driven discovery, and AI.

This Symposium will showcase cross-disciplinary research, and also highlight research challenges, with a particular focus on projects involving biosciences and clinical medicine.


Programme Schedule:

08:45 -   09:20  

Regsistration and Coffee                             

09:20 - 09:30    Housekeeping / Opening Remarks    

Anna Vignoles (Faculty of  Eductaion)

09:30 - 09:40 ATI Introduction

Zoe Kourtzi (The Alan Turing Institute)


Session 1: Physical Science

Chair: Richard McMahon

09:40 - 10:00

Big data in the LHC, and a personal view of the state of Machine Learning within particle physics                  

Chris Lester (Physics)
10:00 - 10:20

The Cambridge Centre for Doctoral Training in Data Intensive Science

Kaisey Mandel(DPMMS/Astronomy)
10:20 - 10:40 Towards exascale simulations of our Universe

Debora Sijacki(Astronomy)


10:40 - 11:10

Morning Coffee Break


Session 2: Bio Sciences

Chair: Gos Micklem

11:10 - 11:30

Using cryoEM to define the structures of large and complex energy-transducing enzymes

Judy Hirst (MRC MBU)
11:30 - 11:50

Observing the world from space

David Coomes (Plant Sciences)
11:50 - 12:10

Discovering ancient populations and human ancestry using global genome sequence data

Aylwyn Scally(Genetics)



Session 3: Early Career

Chair: Jatinder Singh

12:10 - 12:20

Active learning: Data quality over quantity

Sian Gooding(Computer Lab)
12:20 - 12:30

Data-driven evaluation of en masse treatment in Uganda

Goylette Chami(Pathology)
12:30 - 12:40

The impact of data analytics on fundamental rights

Heleen Janssen(Computer Lab)
12:40- 12:50

Big Data; Small computation

Daniel Bates (Computer Lab)
12:50 - 13:00

Performance tuning with structured bayesian optimisation and reinforcement learning

Eiko Yoneki(Computer Lab)
13:00 - 13:10 Decoding biology through Variational Autoencoders

Helena Andres Terre(Computer Lab)


13:10 - 14:00



Session 4: AI in Healthcare

Chair: Carola-Bibiane Schönlieb and Mihaela van der Schaar

14:00 - 14:20

Machine learning tools for early diagnosis and prediction in dementia

Joseph Giorgio(Adaptive Brain Lab)

14:20 - 14:40

Prägnanz : Building AI based clinical workflows for radiotherapy & radiomics research

Rai Jena (Oncology)
14:40 - 15:00

Big data/big sick: Data science and critical illness

Ari Ercole (Medicine)



Session 5: Social Sciences

Chair: Anna Vignoles

15:00 - 15:20

Data, machine learning, and connecting with customers

Orlando Machado(Aviva)
15:20 - 15:40

GDPR and 'Big Data' research               

David Erdos (Faculty of Law)
15:40 - 16:00 Real-time analysis of urban sensor data

Ian Lewis (Department of Computer Science and Technology)


16:00 - 16:30

Afternoon Tea Break


Session 6: Humanities

Chair: Anne Alexander

16:30 - 17:00

Data mining English family history society records

Gill Newton(CAMPOP)
17:00 - 17:30 The Concept Lab Pete de Bolla(English)
17:30 - 17:40 Closing Remarks

Filippo Spiga(Cambridge Big Data)


17:40 - 18:40 Wine Reception and Networking

Registration for this event has closed.

How to find the Sainsbury Laboratory.

Forthcoming talks

Achieving Consistent Low Latency for Wireless Real-Time Communications with the Shortest Control Loop

Thursday, 18 August 2022, 4.00pm to 5.00pm
Speaker: Zili Meng, Tsinghua Unversity
Venue: FW11 and

Real-time communication (RTC) applications like video conferencing or cloud gaming require consistent low latency to provide a seamless interactive experience. However, wireless networks including WiFi and cellular, albeit providing a satisfactory median latency, drastically degrade at the tail due to frequent and substantial wireless bandwidth fluctuations. We observe that the control loop for the sending rate of RTC applications is inflated when congestion happens at the wireless access point (AP), resulting in untimely rate adaption to wireless dynamics. Existing solutions, however, suffer from the inflated control loop and fail to quickly adapt to bandwidth fluctuations. In this paper, we propose Zhuge, a pure wireless AP based solution that reduces the control loop of RTC applications by separating congestion feedback from congested queues. We design a Fortune Teller to precisely estimate per-packet wireless latency upon its arrival at the wireless AP. To make Zhuge deployable at scale, we also design a Feedback Updater that translates the estimated latency to comprehensible feedback messages for various protocols and immediately delivers them back to senders for rate adaption. Trace-driven and real-world evaluation shows that Zhuge reduces the ratio of large tail latency and RTC performance degradation by 17% to 95%.

Speaker Bio: Zili is a 3rd-year PhD student in Tsinghua University. His current research interest focuses on real-time video communications. He has published several papers in SIGCOMM / NSDI and received the Microsoft Research Asia PhD Fellowship, Gold Medal of SIGCOMM 2018 Student Research Competition, and two best paper awards.

BSU Seminar: "Genome-wide genetic models for association, heritability analyses and prediction"

Monday, 22 August 2022, 4.30pm to 5.30pm
Speaker: David Balding, Honorary Professor of Statistical Genetics at UCL Genetics Institute and University of Melbourne
Venue: Seminar Rooms 1 & 2, School of Clinical Medicine, Hills Road, Cambridge CB2 0SP

Although simultaneous analysis of genome-wide SNPs has been popular for over a decade, the problems posed by more SNPs than study participants (more parameters than data points), and correlations among the SNPs, have not been adequately overcome so that almost all published genome-wide analyses are suboptimal. While there has been much attention paid to the shape of prior distributions for SNP effect sizes, we argue that this attention is misplaced. We focus on what we call the "heritability model": a low-dimensional model for the expected heritability at each SNP, which is key to both individual-data and summary-statistic analyses. The 1-df uniform heritability model has been implicitly adopted in a wide range of analyses. Replacing it with better heritability models, using predictors based on allele frequency, linkage disequilibrium and functional annotations, leads to substantial improvements in estimates of heritability and selection parameters over traits, and over genome regions, as well as improvements in gene-based association testing and prediction. Key collaborators Doug Speed, Aarhus, Denmark and Melbourne PhD student Anubhav Kaphle.

Statistics Clinic Summer 2022 III

Wednesday, 31 August 2022, 5.30pm to 7.00pm
Speaker: Speaker to be confirmed
Venue: Venue to be confirmed

If you would like to participate, please fill in the following "form": The deadline for signing up for a session is 12pm on Monday the 29th of August. Subject to availability of members of the Statistics Clinic team, we will confirm your in-person or remote appointment.

This event is open only to members of the University of Cambridge (and affiliated institutes). Please be aware that we are unable to offer consultations outside clinic hours.

Statistics Clinic Summer 2022 IV

Wednesday, 21 September 2022, 5.30pm to 7.00pm
Speaker: Speaker to be confirmed
Venue: Venue to be confirmed

Abstract not available

Title to be confirmed

Monday, 26 September 2022, 3.00pm to 4.00pm
Speaker: Christopher Yau, University of Manchester
Venue: CRUK CI Lecture Theatre

Abstract not available