PolyTB: a genomic variation map for Mycobacterium tuberculosis - PubMed (original) (raw)

doi: 10.1016/j.tube.2014.02.005. Epub 2014 Feb 15.

Mark Preston 2, José Afonso Guerra-Assunção 3, Grant Hill-Cawthorn 4, David Harris 5, João Perdigão 6, Miguel Viveiros 7, Isabel Portugal 6, Francis Drobniewski 8, Sebastien Gagneux 9, Judith R Glynn 3, Arnab Pain 10, Julian Parkhill 5, Ruth McNerney 2, Nigel Martin 11, Taane G Clark 12

Affiliations

PolyTB: a genomic variation map for Mycobacterium tuberculosis

Francesc Coll et al. Tuberculosis (Edinb). 2014 May.

Abstract

Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest.

Keywords: Database; Genomics; Molecular epidemiology; Mycobacterium tuberculosis; Software; Whole-genome sequencing.

Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

PubMed Disclaimer

Figures

Figure 1

Figure 1

RAxML maximum likelihood phylogenetic tree constructed for all 1470 isolates (spoligotype colour-coded). Radial phylogram representation of the best-scoring maximum likelihood phylogenetic tree constructed using RAxML software. Samples are colour-coded by spoligotype strain showing a clear correlation of SNP and spoligotype clustering.

Figure 2

Figure 2

Polymorphism frequency and density plots. (a) SNP frequency bar plot; (b) SNP density plots for all, coding, non-coding and Tuberculist-functional annotated families; (c) Small indel frequency plot; (d) Indel density plots for all, coding, non-coding and functional-grouped coding indels.

Figure 3

Figure 3

Polymorphisms at the rpoB-rpoC region associated with rifampicin resistance (Browser View). Genetic variants are shown at the _rpoB a_nd rpoC genes, loci known to be associated with rifampicin resistance. Synonymous SNPs (sSNPs) are coloured in black, non-synonymous SNPs (nsSNPs) in red and small insertions and deletions in blue and green, respectively. Cursor movement over variants displays an information box with further annotation including nucleotide, codon and amino acid changes for SNPs; and length and sequence for indels. Locations and Spoligotypes tracks are placed as colour-coded vertical bars at the left hand side of the genomic plot and provide information for samples. Sixty isolates are shown, 30 from Malawi (colour-coded in red in the Location bar) and 30 from Uganda (shown in green). Patterns of SNP difference can be observed when comparing isolates from different populations: Kampala isolates harbour many more nsSNPs at rpoB gene than Malawian isolates. The observed nsSNPs are likely to be the underlying cause of rifampicin resistance (Clark et al., 2013). In fact, _rpoB-_516 (A → T SNP at 761,110 bp), rpoB-526 (G → T 761,139 bp and A → G 761,140 bp) and _rpoB-_531 (C → G 761,155 bp) mutations are observed in Ugandan isolates, and correspond to nsSNPs already reported as rifampicin resistance markers .

Figure 4

Figure 4

SNP associated with lineage 1 (EAI) in Tanzanian and Malawian populations (Map view). Allele frequencies are shown for the chosen polymorphic position as pie charts, either alone or combined with in silico inferred spoligotypes (Coll et al., 2012) to allow the visual detection of relationships between certain alleles and strain types. Reference allele frequency portions on pie charts are coloured in blue while alternative allele (i.e. non-reference) frequencies are shown in red. Outer chart portions representing relative strain type frequencies are colour-coded by main spoligotype families (AFRI, BOV, Beijing, CAS, EAI, LAM, Manu, S, T and X). In this particular case, the SNP at 4,411,016 bp position is found to be associated with lineage 1 (EAI) strains in Tanzania and Karonga (Malawi) populations, visualised as the red portion of the inner pie chart linking with the purple portions of the outer pie in both settings.

Figure 5

Figure 5

SNP-based neighbour-joining phylogenetic tree of 140 isolates belonging to four different locations (Phylogenetic view) A neighbour phylogenetic tree based on pre-calculated SNP distances is built in real time for the set of 140 isolates from Shanghai (China), Hamburg (Germany), Karonga (Malawi) and Kampala (Uganda). Spoligotype lineages and locations are colour-coded as bar charts around the tree (outer bar representing locations and the inner one spoligotypes) to enable the visual identification of correlations between spoligotype/location and phylogenetic clustering. A table summarising all colour codes will be shown at the left hand side of the page.

Similar articles

Cited by

References

    1. Abubakar I., Zignol M., Falzon D., Raviglione M., Ditiu L., Masham S., Adetifa I., Ford N., Cox H., Lawn S.D., Marais B.J., McHugh T.D., Mwaba P., Bates M., Lipman M., Zijenah L., Logan S., McNerney R., Zumla A., Sarda K., Nahid P., Hoelscher M., Pletschette M., Memish Z.a., Kim P., Hafner R., Cole S., Migliori G.B., Maeurer M., Schito M., Zumla A. Drug-resistant tuberculosis: time for visionary political leadership. Lancet Infect Dis. 2013;13:529–539. - PubMed
    1. Garcia-Betancur J.C., Menendez M.C., Del Portillo P., Garcia M.J. Alignment of multiple complete genomes suggests that gene rearrangements may contribute towards the speciation of Mycobacteria. Infect Genet Evol. 2011;12:819–826. - PubMed
    1. Blouin Y., Hauck Y., Soler C., Fabre M., Vong R., Dehan C., Cazajous G., Massoure P.-L., Kraemer P., Jenkins A., Garnotel E., Pourcel C., Vergnaud G. Significance of the identification in the horn of Africa of an exceptionally deep branching Mycobacterium tuberculosis clade. PloS One. 2012;7:e52841. - PMC - PubMed
    1. Comas I., Coscolla M., Luo T., Borrell S., Holt K.E., Kato-Maeda M., Parkhill J., Malla B., Berg S., Thwaites G., Yeboah-Manu D., Bothamley G., Mei J., Wei L., Bentley S., Harris S.R., Niemann S., Diel R., Aseffa A., Gao Q., Young D., Gagneux S. Out-of-Africa migration and neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nat Genet. 2013;45:1176–1182. - PMC - PubMed
    1. Supply P., Marceau M., Mangenot S., Roche D., Rouanet C., Khanna V., Majlessi L., Criscuolo A., Tap J., Pawlik A., Fiette L., Orgeur M., Fabre M., Parmentier C., Frigui W., Simeone R., Boritsch E.C., Debrie A.-S., Willery E., Walker D., Quail M.a., Ma L., Bouchier C., Salvignol G., Sayes F., Cascioferro A., Seemann T., Barbe V., Locht C., Gutierrez M.-C., Leclerc C., Bentley S.D., Stinear T.P., Brisse S., Médigue C., Parkhill J., Cruveiller S., Brosch R. Genomic analysis of smooth tubercle bacilli provides insights into ancestry and pathoadaptation of Mycobacterium tuberculosis. Nat Genet. 2013;45:172–179. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources