A computational approach to the genetic basis of antigenic change in influenza A

Sarah James, Department of Zoology

Sarah L James and Derek J Smith

Department of Zoology, University of Cambridge

Center for Pathogen Evolution, Department of Zoology, University of Cambridge

World Health Organization (WHO) Collaborating Center for Modeling, Evolution, and Control of Emerging Infectious Diseases, Cambridge

Abstract

Seasonal influenza epidemics are estimated to cause 35 million cases of severe illness and 250,000500,000 deaths annually. Vaccination is recommended for people over 65 years, children 24 years and those in clinical risk groups. Influenza virus evades the immune response by mutations in the surface protein, haemagglutinin, a process called antigenic variation. Therefore the vaccine is updated as recommended by WHO (World Health Organisation) based on worldwide phenotypic, genetic and epidemiological data. It would be very convenient if it were possible to predict the antigenic phenotype on the basis of genetic data.

Antigenic cartography is a quantitative method to simplify and visualise the results of antigen crossreactivity assay. Behaviour of a viral strain in these assays is related to the amino acid sequence of the HA1 domain of the haemagglutinin protein. The position of a viral strain in the antigenic map was used as a description of antigenic phenotype of that strain. We performed a regressionbased analysis of the genetic correlates of antigenicity in several datasets, comparing two methods. The main dataset consisted of 253 viral strains from 19682003 for which both genetic and antigenic data were available. The first method compared the distance between two strains and the mutations needed to convert one strain into the other. The second method assumed that the position of the viral strain in the map was a function of the genetic sequence.

These methods were evaluated by success in prediction and their ability to identify single amino acid mutations experimentally validated to have large antigenic effects. Both methods fairly reliably identified the amino acids known cause antigenic variation. However, the second method (positionbased regression) had inherent limitations in prediction. The results also highlighted the need to control for population structure. Further optimisation and experimental verification is needed to explore the utility of these techniques.