Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis - PubMed (original) (raw)

Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis

Olivo Miotto et al. BMC Bioinformatics. 2008.

Abstract

Background: The identification of mutations that confer unique properties to a pathogen, such as host range, is of fundamental importance in the fight against disease. This paper describes a novel method for identifying amino acid sites that distinguish specific sets of protein sequences, by comparative analysis of matched alignments. The use of mutual information to identify distinctive residues responsible for functional variants makes this approach highly suitable for analyzing large sets of sequences. To support mutual information analysis, we developed the AVANA software, which utilizes sequence annotations to select sets for comparison, according to user-specified criteria. The method presented was applied to an analysis of influenza A PB2 protein sequences, with the objective of identifying the components of adaptation to human-to-human transmission, and reconstructing the mutation history of these components.

Results: We compared over 3,000 PB2 protein sequences of human-transmissible and avian isolates, to produce a catalogue of sites involved in adaptation to human-to-human transmission. This analysis identified 17 characteristic sites, five of which have been present in human-transmissible strains since the 1918 Spanish flu pandemic. Sixteen of these sites are located in functional domains, suggesting they may play functional roles in host-range specificity. The catalogue of characteristic sites was used to derive sequence signatures from historical isolates. These signatures, arranged in chronological order, reveal an evolutionary timeline for the adaptation of the PB2 protein to human hosts.

Conclusion: By providing the most complete elucidation to date of the functional components participating in PB2 protein adaptation to humans, this study demonstrates that mutual information is a powerful tool for comparative characterization of sequence sets. In addition to confirming previously reported findings, several novel characteristic sites within PB2 are reported. Sequence signatures generated using the characteristic sites catalogue characterize concisely the adaptation characteristics of individual isolates. Evolutionary timelines derived from signatures of early human influenza isolates suggest that characteristic variants emerged rapidly, and remained remarkably stable through subsequent pandemics. In addition, the signatures of human-infecting H5N1 isolates suggest that this avian subtype has low pandemic potential at present, although it presents more human adaptation components than most avian subtypes.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Effect of set size on information entropy. The probability density of entropy values at four sites of the Influenza A PB2 proteins is plotted for alignments of decreasing sequence count N (graph A: N = 250; graph B: N = 50; graph C: N = 20). For each graph, we constructed 200 random alignments of the required size from the PB2 master alignment. The entropy mean and standard deviation measured from these alignments were used to plot the normal probability distributions shown in this chart. The entropy values for different sites are well-separated in large sequence sets (plot A) while the likelihood of distinguishing medium-entropy sites from high- or low-entropy sites drops dramatically at low sequence counts (plot C). The sites were selected based on their equally-spaced entropy values.

Figure 2

Figure 2

Effect of set size bias on mutual information. In both graphs, the y-axis represents the measured mutual information (MI) between two sets of influenza A PB2 protein sequences, comprising human and avian sequences respectively. The x-axis represents the size ratio Nh/Na, where Nh and Na are the sequence count in the human and avian sets respectively. A) Changes in MI at selected alignment sites as Nh is varied (Na = 719). MI values fall rapidly as the ratio decreases, especially at sites with high MI. B) Each data point is computed by averaging the MI obtained by comparing the human set with 200 random subsampled sets of avian sequences with the same sequence count. The estimated MI values remain stable up to a size ratio of approximately 1:10. At very low ratios, increased sampling errors due to small set size tend to lower the reliability of the estimate.

Figure 3

Figure 3

Screenshot of the Antigenic Diversity Analyzer (AVANA). This screenshot shows the AVANA tools used in a comparison of the A2A (top) and HxN2 (bottom) subsets. The horizontal axis corresponds to the positions along the alignment, while the vertical axes represent the entropy of each subset (blue), and mutual information (green) between the two subsets. Characteristic sites are identifiable by the presence of MI peaks. On the left-hand site, AVANA displays the residue statistics at the currently selected position: the E627K characteristic mutation is shown in this example.

Figure 4

Figure 4

Characteristic sites for human-to-human transmission (H2H) identified in the PB2 protein of the influenza A virus. The sites, whose position is indicated in the circles, are arranged along the length of the protein, with the avian (A2A) variants and the H2H variants indicated above and below the circles respectively. Where multiple variants are present at a site, they are shown in decreasing order of frequency. The coloured lines in the upper part of the figure show the extent of identified PB2 functional domains: the binding regions of PB2 to the PB1 and NP proteins [33] are shown in red and green respectively; the RNA cap binding regions [34, 35] in blue; and the nuclear localization signals (NLS) [32] in orange. Except for site 292, all characteristic sites identified are within one, or two functional domains. The lower part of the figure shows characteristic sites previously identified in other studies [8, 7].

Figure 5

Figure 5

Evolution and reassortment of human Influenza A viruses. This figure (adapted from [3]) shows how human-transmissible Influenza A subtypes were acquired from the avian pool during 20th Century pandemics. A full complement of eight gene segments of avian origins originated the 1918 Spanish flu, while the following two pandemics followed the acquisition of a smaller number of avian genes through recombination. In 1957, the H2N2 Asian flu replaced the HA, NA and PB1 segments, while the H3N2 Hong Kong pandemic of 1968 replaced the HA and PB1 segments only. In each of these pandemics, the new subtype fully replaced the previously circulating subtype. The minor Russian pandemic of 1977 was caused by the reintroduction of a H1N1 strain almost identical to that circulating prior to 1957, leading to the widely-held view that it was caused by the release of 20-year old frozen viruses. The H1N1 strain has not supplanted H3N2, and the two lineages co-circulate in the human population to the present day; the recently emerged H1N2 subtype has arisen from their reassortment. All currently circulating PB2 proteins are therefore thought to have descended from the Spanish flu strain, although the PB2 protein associated with HxN2 has diverged significantly from that of the H1N1 lineage.

Figure 6

Figure 6

Timeline of adaptation to human-to-human transmission for the influenza A PB2 protein. Using the H2H characteristic variant pattern (see Fig. 4), we produced signatures for each available human sequence isolated before 1970, and arranged them in chronological order. The signature columns show the residues observed at each of the characteristic sites, in the order given in Figure 4. Each signature is annotated with subtype, year and country of isolation, and isolate name. The first and the last pattern of the alignment are the consensus signatures for avian and human-to-human transmissible sequences respectively. Avian characteristic variants are shown on a dark blue background, human characteristic variants on a yellow background, and all other variants are on white. Red horizontal lines indicate the start of the 1957 and 1968 pandemics, which introduced the H2N2 and H3N2 subtypes respectively. The GenPept accession numbers for all sequences used are listed in Table S1 in Additional file 1.

Figure 7

Figure 7

Signatures of swine-isolated influenza A PB2 proteins against the H2H characteristic variant pattern. The H2H characteristic variant pattern is the same as that used in Figure 6. The symbol 'X' at a characteristic site indicates that the residue is unknown, due to an incompletely sequenced protein. Some patterns, whose signatures are represented by other retained sequences, were removed from the alignment to make the figure more compact. The GenPept accession numbers for all sequences used are listed in Table S2 in Additional file 1.

Figure 8

Figure 8

Signatures of human-isolated H5N1 influenza A PB2 proteins against the H2H characteristic variant pattern. The H2H characteristic variant pattern is the same as that used in Figure 6. Some patterns, whose signatures are represented by other retained sequences, were removed from the alignment to make the figure more compact. The GenPept accession numbers for all sequences used are listed in Table S3 in Additional file 1.

References

    1. Baigent SJ, McCauley JW. Influenza type A in humans, mammals and birds: determinants of virus virulence, host-range and interspecies transmission. Bioessays. 2003;25:657–671. doi: 10.1002/bies.10303. - DOI - PubMed
    1. Mills CE, Robins JM, Bergstrom CT, Lipsitch M. Pandemic influenza: risk of multiple introductions and the need to prepare for them. PLoS Med. 2006;3:e135. doi: 10.1371/journal.pmed.0030135. - DOI - PMC - PubMed
    1. Webster RG, Bean WJ, Gorman OT, Chambers TM, Kawaoka Y. Evolution and ecology of influenza A viruses. Microbiol Rev. 1992;56:152–179. - PMC - PubMed
    1. Taubenberger JK, Reid AH, Janczewski TA, Fanning TG. Integrating historical, clinical and molecular genetic data in order to explain the origin and virulence of the 1918 Spanish influenza virus. Philos Trans R Soc Lond B Biol Sci. 2001;356:1829–1839. doi: 10.1098/rstb.2001.1020. - DOI - PMC - PubMed
    1. Neumann G, Kawaoka Y. Host range restriction and pathogenicity in the context of influenza pandemic. Emerg Infect Dis. 2006;12:881–886. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources