Home / Events / Big Data in Medicine: Exemplars and Opportunities in Data Science

Big Data in Medicine: Exemplars and Opportunities in Data Science

Friday, 19 June 2015, 12.00pm to 7.00pm
Location: Cancer Research UK Cambridge Institute

The data generated by medical care and medically relevant research are rapidly becoming bigger and more complex, particularly with the advent of new technologies. Our ability to advance medical care and efficiently translate science into modern medicine is bounded by our capacity to access and process these big data. From human genetics and pathogen genomics to routine clinical documentation, from internal imaging to motion capture, from digital epidemiology to pharmacokinetics, and from treatment pathways to life course assessment, the big Vs of Big Data - volume, variety, velocity and veracity - abound in medicine. Statistical, mathematical, visualisation, and computational approaches, from a wide range of disciplines, as well systems for innovative ICT-based interventions are needed to keep apace of the complexity in Big Data and to advance medicine.

On 19th June 2015 at the Cancer Research UK Cambridge Institute, Cambridge-based researchers from all Schools of the University and local research institutes, the pharmaceutical industry and our funding and commissioning partners met for an afternoon of talks demonstrating methods and opportunities for harnessing Big Data in medicine.  

Read the abstracts and selected presentation slides below. 

Programme at a glance 


Registration , Lunch & Poster Session



Opening remarks

Patrick Maxwell , Regius Professor of Physic


Session 1: Exemplars

Chairs: Simon Tavaré (CRUK-CI/DAMTP), John Aston (DPMMS)


Keynote: Statistical challenges in the analysis of genomic data

Sylvia Richardson, Director, MRC Biostatistics Unit


Building Insight Into Disease and Therapy from Real-World Evidence Using Graphs and Large-Scale Analytics

Nirmal Keshava, AstraZeneca R&D Information


Undiscovered Scientific Knowledge from Large Unstructured Text Collections in an Era of Big Data

Nigel Collier, Department of Theoretical and Applied Linguistics


Integrating Chemical and Biological Data for Drug Discovery

Andreas Bender, Centre for Molecular Informatics, Department of Chemistry


Mapping the information processing pathways of the cortex: challenges and opportunities

Andrew Thwaites, Psychology Dept, Cambridge University & MRC-CBSU





Session 2: Opportunities

Chairs: Lydia Drumright (Department of Medicine), John Todd (Cambridge Institute for Medical Research)


Keynote: Clinical Informatics

Afzal Chaudhry, BRC, Chief Clinical Information Officer


EMBL-EBI Big Data in Medicine Strategy

Paul Flicek, European Bioinformatics Institute


Smartphones, Big Data, and Psychiatry

Neal Lathia, Computer Laboratory, and Conor Farrington, Cambridge Centre for Health Services Research, Institute of Public Health


Treatment Pathways in Cancer Data

Brian Shand, National Cancer Registration Service, Public Health England


High-Content Microscopy and Big Data: Discovering the genes and pathways that control cells in health & disease

Rafael Carazo Salas, Department of Genetics


Poster talks



Mathematical Methods for Automatic Detection and Tracking of Dividing Cancer Cells in Phase Contrast Microscopy

Joana Sarah Grah, Department of Applied Mathematics and Theoretical Physics


Statistical tools for single cell gene expression analysis

Daphne Ezer, Department of Genetics/Cambridge Systems Biology Centre


From big data to big model: a probabilistic approach to infer cancer evolution

Ke Yuan, Cancer Research UK Cambridge Institute


Closing remarks

Keith McNeil, CUH Foundation Trust CEO


Drinks & Poster Session




Sketch-driven Data Analysis

Neil Satra, University of Cambridge Computer Laboratory

Limitations of de-identification: no reason not to share data

Neil Walker, Department of Medical Genetics

Non-negative matrix tri-factorisation with missing values, applied to drug sensitivity prediction

Thomas Alexander Brouwer, Computer Lab, University of Cambridge

Mineotaur: interactive visual analytics for high-content microscopy screens

Balint Antal, Department of Genetics, University of Cambridge

Molecular principles by which gene fusions affect protein interaction networks in cancer

Natasha Latysheva, MRC Laboratory of Molecular Biology

High-dimensional statistical approaches for heterogeneous molecular data in cancer medicine

Frank Dondelinger, MRC Biostatistics Unit

A computational approach to the genetic basis of antigenic change in influenza A

Sarah James, Department of Zoology

Performing large scale conditional analysis in GWAS: How to better exploit summary statistics?

Paul Newcombe, MRC Biostatistics Unit

Empirical Bayes in Genomics: when dimensionality is a blessing

Gwenael G.R. Leday, MRC Biostatistics Unit, Cambridge

Analysis of Iterative Screening with Stepwise Compound Selection Based on Novartis in-house HTS Data

Shardul Paricharak, Department of Chemistry

Little Data, Big Health

Wei Wang, Little Data Labs

Baal-ChIP: Allele-specific ChIP-seq analysis from cancer cell lines

Ines de Santiago, CRUK - Cambridge Institute

The potential of hyperspectral imaging in fluorescent contrast enhanced imaging

Anna Siri Luthman, Department of Physics and Cancer Research UK Cambridge Institute

The ContentMine

Jenny Molloy, Cambridge Synthetic Biology SRI, The ContentMine

Forthcoming talks

Achieving Consistent Low Latency for Wireless Real-Time Communications with the Shortest Control Loop

Thursday, 18 August 2022, 4.00pm to 5.00pm
Speaker: Zili Meng, Tsinghua Unversity
Venue: FW11 and

Real-time communication (RTC) applications like video conferencing or cloud gaming require consistent low latency to provide a seamless interactive experience. However, wireless networks including WiFi and cellular, albeit providing a satisfactory median latency, drastically degrade at the tail due to frequent and substantial wireless bandwidth fluctuations. We observe that the control loop for the sending rate of RTC applications is inflated when congestion happens at the wireless access point (AP), resulting in untimely rate adaption to wireless dynamics. Existing solutions, however, suffer from the inflated control loop and fail to quickly adapt to bandwidth fluctuations. In this paper, we propose Zhuge, a pure wireless AP based solution that reduces the control loop of RTC applications by separating congestion feedback from congested queues. We design a Fortune Teller to precisely estimate per-packet wireless latency upon its arrival at the wireless AP. To make Zhuge deployable at scale, we also design a Feedback Updater that translates the estimated latency to comprehensible feedback messages for various protocols and immediately delivers them back to senders for rate adaption. Trace-driven and real-world evaluation shows that Zhuge reduces the ratio of large tail latency and RTC performance degradation by 17% to 95%.

Speaker Bio: Zili is a 3rd-year PhD student in Tsinghua University. His current research interest focuses on real-time video communications. He has published several papers in SIGCOMM / NSDI and received the Microsoft Research Asia PhD Fellowship, Gold Medal of SIGCOMM 2018 Student Research Competition, and two best paper awards.

BSU Seminar: "Genome-wide genetic models for association, heritability analyses and prediction"

Monday, 22 August 2022, 4.30pm to 5.30pm
Speaker: David Balding, Honorary Professor of Statistical Genetics at UCL Genetics Institute and University of Melbourne
Venue: Seminar Rooms 1 & 2, School of Clinical Medicine, Hills Road, Cambridge CB2 0SP

Although simultaneous analysis of genome-wide SNPs has been popular for over a decade, the problems posed by more SNPs than study participants (more parameters than data points), and correlations among the SNPs, have not been adequately overcome so that almost all published genome-wide analyses are suboptimal. While there has been much attention paid to the shape of prior distributions for SNP effect sizes, we argue that this attention is misplaced. We focus on what we call the "heritability model": a low-dimensional model for the expected heritability at each SNP, which is key to both individual-data and summary-statistic analyses. The 1-df uniform heritability model has been implicitly adopted in a wide range of analyses. Replacing it with better heritability models, using predictors based on allele frequency, linkage disequilibrium and functional annotations, leads to substantial improvements in estimates of heritability and selection parameters over traits, and over genome regions, as well as improvements in gene-based association testing and prediction. Key collaborators Doug Speed, Aarhus, Denmark and Melbourne PhD student Anubhav Kaphle.

Statistics Clinic Summer 2022 III

Wednesday, 31 August 2022, 5.30pm to 7.00pm
Speaker: Speaker to be confirmed
Venue: Venue to be confirmed

If you would like to participate, please fill in the following "form": The deadline for signing up for a session is 12pm on Monday the 29th of August. Subject to availability of members of the Statistics Clinic team, we will confirm your in-person or remote appointment.

This event is open only to members of the University of Cambridge (and affiliated institutes). Please be aware that we are unable to offer consultations outside clinic hours.

Statistics Clinic Summer 2022 IV

Wednesday, 21 September 2022, 5.30pm to 7.00pm
Speaker: Speaker to be confirmed
Venue: Venue to be confirmed

Abstract not available

Title to be confirmed

Monday, 26 September 2022, 3.00pm to 4.00pm
Speaker: Christopher Yau, University of Manchester
Venue: CRUK CI Lecture Theatre

Abstract not available