Treatment Pathways in Cancer Data

Back to: 
Big Data in Medicine: Exemplars and Opportunities in Data Science

Brian Shand, National Cancer Registration Service, Public Health England

Treatment Pathways in Cancer Data

Dr Brian Shand
National Cancer Registration Service, Public Health England
Brian.Shand@phe.gov.uk

Background:

The National Cancer Registration Service (NCRS), Public Health England registers all malignant tumours and some pre-cancerous lesions occurring in people resident in England at the time of diagnosis. Cancer registration is the expert task of distilling a definitive assessment of each tumour, from a range of complex and incomplete clinical sources. This relies on efficient information processing, including large volumes of electronic data sources. The resulting dataset is an immensely valuable, high quality resource, for understanding the disease burden of cancer, quantifying improvements in treatment over time, and planning future cancer services.


The Challenge:

Increasingly rich electronic data sources are now available for cancer patients. This enables new classes of cancer data analysis, with much richer semantic structure than conventional statistical analyses of the data. For example, patients' treatments and other clinical events can be seen as nodes in a graph structure, supporting new ways of visualising and exploring cancer treatment pathways across the health service. Furthermore, individual patients' treatments can be condensed into a summary feature vector, a kind of “patient fingerprint”. By clustering these together, we can efficiently identify patterns of treatments, that are hard to identify in the underlying data. With rich and accurate data, this can provide new insights into which treatments would most benefit which patients, and assist in the planning and analysis of new cancer treatment options.

Future Opportunities:

A further challenge is that analysis and release of healthcare data is difficult to do safely and securely, without breaching patient confidentiality. Transforming the data into a condensed form provides new possibilities for  pseudonymisation and non-identifiable data release. This would enable researchers outside the health service to more easily contribute to studies using this data, and assist in the development of new Big Data analysis techniques.