Menu

Home / Events / Big Data in Medicine: Exemplars and Opportunities in Data Science / Performing large scale conditional analysis in GWAS: How to better exploit summary statistics?

Performing large scale conditional analysis in GWAS: How to better exploit summary statistics?

Back to: 
Big Data in Medicine: Exemplars and Opportunities in Data Science

Paul Newcombe, MRC Biostatistics Unit

Performing large scale conditional analysis in GWAS: how to better exploit summary statistics

Paul Newcombe, MRC Biostatistics Unit

Abstract

Genome-­‐Wide Association Studie(GWAS)s, in which hundreds of thousands of genetic variants are tested for association with disease and/or disease traits have proved a hugely important tool in identifying causal mutations. Recently, large scale meta-­‐analyses of such GWASs -­‐-­‐-­‐ accumulating information over tens of thousands of people -­‐-­‐-­‐ have boosted the number of known signals for some traits even further. However, the availability of numerous correlated predictors presents some analytical challenges. Typically GWASs only analyse each variant one at a time. In the case of GWAS meta-­‐analyses this is often all that is possible when the individual cohorts only share summary data with one another. Inference from marginal one at a time tests such as these has limitations – in particular they offer little insight into the location and number of causal variants. Many variants may produce significant associations via correlation with a causal mutation, leading to numerous false positives among the top ranked results.

Ideally we would re-­‐analyse published marginal GWAS results using a joint multi-­‐SNP model to account for the correlation structure. We present a novel Bayesian algorithm which utilises publicly available information on genetic correlations to enable inference on the joint model underlying a set of marginal GWAS statistics. We have implemented the method in an efficient variable selection framework that infers posterior probabilities for each variant, adjusted for all other variants. Through a series of realistic simulation studies we demonstrate substantial gains in positive predictive value among top ranked associations, when marginal statistics are re-­‐analysed using our method. We also present an application to published results from MAGIC (Meta-­‐Analysis of Glucose-­‐ and Insulin-­‐ Related Traits Consortium) -­‐-­‐-­‐ a GWAS meta-­‐analysis of >15,000 people, in which we re-­‐analyse several genomic regions that produced multiple significant signals with glucose levels two hours after oral stimulation.