Statistical tools for single cell gene expression analysis

Back to: 
Big Data in Medicine: Exemplars and Opportunities in Data Science

Daphne Ezer, Department of Genetics/Cambridge Systems Biology Centre

Statistical tools for single cell gene expression analysis

Daphne Ezer, Bertie Gottgens, Boris Adryan

Department of Genetics

Cambridge Systems Biology Centre


Recently, single cell gene expression assays have provided vast amounts of data that can be used to probe cell-to-cell variability in tissues. These single cell methods have been applied in a number of biomedical contexts, from the study of tumor heterogeneity to the study of neurodegenerative disorders, so the development of statistically sound analysis tools for this new large quantity of data is becoming increasingly important. We have developed two statistical tools for analyzing single cell gene expression data. First, we developed a clustering algorithm that can take into account the family of distributions that we would expect to find in single cell gene expression data. This algorithm can distinguish between gene expression heterogeneity caused by bursty gene expression and that caused by mixtures of different cell types. Secondly, we developed a statistical method for evaluating the probability that the burst frequency or transcription rate has been differentially regulated across two populations of cells. This allows us to distinguish between the regulatory mechanisms used to control gene expression. After validating these approaches with simulated data, we applied both these strategies to understand how key transcription factors are regulated during hematopoiesis.