A data compendium associating the genomes of 12,289 Mycobacterium tuberculosis isolates with quantitative resistance phenotypes to 13 antibiotics - PubMed (original) (raw)

A data compendium associating the genomes of 12,289 Mycobacterium tuberculosis isolates with quantitative resistance phenotypes to 13 antibiotics

The CRyPTIC Consortium. PLoS Biol. 2022.

Abstract

The Comprehensive Resistance Prediction for Tuberculosis: an International Consortium (CRyPTIC) presents here a data compendium of 12,289 Mycobacterium tuberculosis global clinical isolates, all of which have undergone whole-genome sequencing and have had their minimum inhibitory concentrations to 13 antitubercular drugs measured in a single assay. It is the largest matched phenotypic and genotypic dataset for M. tuberculosis to date. Here, we provide a summary detailing the breadth of data collected, along with a description of how the isolates were selected, collected, and uniformly processed in CRyPTIC partner laboratories across 23 countries. The compendium contains 6,814 isolates resistant to at least 1 drug, including 2,129 samples that fully satisfy the clinical definitions of rifampicin resistant (RR), multidrug resistant (MDR), pre-extensively drug resistant (pre-XDR), or extensively drug resistant (XDR). The data are enriched for rare resistance-associated variants, and the current limits of genotypic prediction of resistance status (sensitive/resistant) are presented by using a genetic mutation catalogue, along with the presence of suspected resistance-conferring mutations for isolates resistant to the newly introduced drugs bedaquiline, clofazimine, delamanid, and linezolid. Finally, a case study of rifampicin monoresistance demonstrates how this compendium could be used to advance our genetic understanding of rare resistance phenotypes. The data compendium is fully open source and it is hoped that it will facilitate and inspire future research for years to come.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: E.R. is employed by Public Health England and holds an honorary contract with Imperial College London. I.F.L. is Director of the Scottish Mycobacteria Reference Laboratory. S.N. receives funding from German Center for Infection Research, Excellenz Cluster Precision Medicine in Chronic Inflammation, Leibniz Science Campus Evolutionary Medicine of the LUNG (EvoLUNG)tion EXC 2167. P.S. is a consultant at Genoscreen. T.R. is funded by NIH and DoD and receives salary support from the non-profit organization FIND. T.R. is a co-founder, board member and shareholder of Verus Diagnostics Inc, a company that was founded with the intent of developing diagnostic assays. Verus Diagnostics was not involved in any way with data collection, analysis or publication of the results. T.R. has not received any financial support from Verus Diagnostics. UCSD Conflict of Interest office has reviewed and approved T.R.’s role in Verus Diagnostics Inc. T.R. is a co-inventor of a provisional patent for a TB diagnostic assay (provisional patent #: 63/048.989). T.R. is a co-inventor on a patent associated with the processing of TB sequencing data (European Patent Application No. 14840432.0 & USSN 14/912,918). T.R. has agreed to “donate all present and future interest in and rights to royalties from this patent” to UCSD to ensure that he does not receive any financial benefits from this patent. S.S. is working and holding ESOPs at HaystackAnalytics Pvt. Ltd. (Product: Using whole genome sequencing for drug susceptibility testing for Mycobacterium tuberculosis). G.F.G. is listed as an inventor on patent applications for RBD-dimer-based CoV vaccines. The patents for RBD-dimers as protein subunit vaccines for SARS-CoV-2 have been licensed to Anhui Zhifei Longcom Biopharmaceutical Co. Ltd, China.

Figures

Fig 1

Fig 1. Processing sequencing and minimum inhibitory concentration data for 15,211 Mycobacterium tuberculosis isolates (“full” dataset).

Briefly: Each isolate was DNA sequenced using an Illumina machine and plated onto 96-well plates (UKMYC5/6) containing 5–10× doubling dilutions of 13 antitubercular drugs for DST. Associated metadata (including country of origin and processing laboratory) was recorded. DNA variant calling and analysis was performed using Clockwork and Minos [47]). After 14 days, MIC measurements were taken by a trained scientist using Vizion, and the plate was photographed to also measure the MIC using the automated AMyGDA software and citizen scientists from BashTheBug [45]. After quality control procedures, phenotypic MIC data for 2,922 isolates were removed. The compendium therefore contains 15,211 isolates with WGS data (“full dataset”), 12,289 of which have matched quality assessed phenotypic data (“data compendium”). The raw sequence, VCFs, MICs, and binary resistance calls for the data compendium are presented in “CRyPTIC_reuse_table_20211019.csv” via an FTP site (see Methods), and the raw sequence and VCF files for those samples present in the full dataset are presented in “CRyPTIC_excluded_samples_20220607.tsv” via the same FTP site (see Methods). The data tables GENOTYPES.csv, VARIANTS.csv, MUTATIONS.csv, SAMPLES.csv, and PHENOTYPES.csv used for the analyses presented in this manuscript are also accessible via the FTP site (see Methods). DST, drug susceptibility testing; MIC, minimum inhibitory concentration; WGS, whole-genome sequencing.

Fig 2

Fig 2. Geographical distribution of 15,211 CRyPTIC Mycobacterium tuberculosis clinical isolates (“full” dataset).

The total number of isolates contributed by each country is depicted. Where the origin of an isolate was not known, the collection site identity was assigned (this occurred for 269 isolates in Germany, 17 isolates in India, 6 isolates in Peru, 885 isolates in Italy, 510 isolates in South Africa, 357 isolates in Sweden, 208 isolates in Taiwan, 1 isolate in Brazil, and 4 isolates in the UK). The base layer map was sourced from Natural Earth, which is in the public domain (see

http://www.naturalearthdata.com/about/terms-of-use/

), and created using the R packages “ggmap” and “maps” (see

github.com/kerrimalone/Brankin_Malone_2022

/).

Fig 3

Fig 3. Drug phenotype data for the CRyPTIC compendium.

(A) Frequency of resistance to each of 13 drugs in the data compendium. The total number of isolates with a binary phenotype (of any quality) for the corresponding drug is presented in Table 1. (B) Phenotypes of the 12,289 isolates with a binary phenotype for at least 1 drug. (C) Geographical distribution of phenotypes of 12,289 compendium isolates. Intensity of blue shows the percentage of isolates contributed that were categorised as susceptible to all 13 drugs (“%S”). Donut plots show the proportions of resistant phenotypes identified in (B) for countries contributing > = 100 isolates with drug resistance. (D) Proportions of resistance phenotypes in the 4 major Mycobacterium tuberculosis lineages. N is the number of isolates of the lineage called resistant to at least one of the 13 drugs. The base layer map in (C) was sourced from Natural Earth, which is in the public domain (see

http://www.naturalearthdata.com/about/terms-of-use/

), and created using the python library “geopandas” (see

github.com/kerrimalone/Brankin_Malone_2022

/). AMI, amikacin; BDQ, bedaquiline; CFZ, clofazimine; CRyPTIC, Comprehensive Resistance Prediction for Tuberculosis: an International Consortium; DLM, delamanid; EMB, ethambutol; ETH, ethionamide; INH, isoniazid; KAN, kanamycin; LEV, levofloxacin; LZD, linezolid; MDR, multidrug resistant; MXF, moxifloxacin; pre-XDR, pre-extensively drug resistant; RFB, rifabutin; RIF, rifampicin; RR, rifampicin resistant; XDR, extensively drug resistant.

Fig 4

Fig 4. Co-occurrence of resistance to 1 drug conditional on resistance to another drug, or to resistance background.

(A) The heatmap shows the probability of an isolate being resistant to Drug 2 if it is resistant to Drug 1, percentages are given in Table F in S1 File. (B-F) Percentage of isolates that are resistant to another of the 13 drugs in a background of (B) isoniazid susceptible + rifampicin susceptible (but resistant to at least one other antitubercular drug), (C) isoniazid resistant + rifampicin susceptible, (D) MDR/RR, (E) Pre-XDR, and (F) XDR. Only samples with definite phenotypes for RIF in MDR backgrounds and RIF and INH in non-MDR backgrounds and the additional drug are included. AMI, amikacin; BDQ, bedaquiline; CFZ, clofazimine; CRyPTIC, Comprehensive Resistance Prediction for Tuberculosis: an International Consortium; DLM, delamanid; EMB, ethambutol; ETH, ethionamide; INH, isoniazid; KAN, kanamycin; LEV, levofloxacin; LZD, linezolid; MDR, multidrug resistant (resistant to first-line drugs isoniazid and rifampicin); MXF, moxifloxacin; pre-XDR, pre-extensively drug resistant (MDR/RR + fluoroquinolone resistant); RFB, rifabutin; RIF, rifampicin; RR, rifampicin resistant; XDR, extensively drug resistant (MDR/RR + resistant to at least 1 fluoroquinolone and either bedaquiline or linezolid).

Fig 5

Fig 5. Resistance to bedaquiline, clofazimine, delamanid, and linezolid among Mycobacterium tuberculosis compendium isolates.

(A) The prevalence (within these data) of resistance to BDQ, CFZ, DLM, and LZD per country or origin or collection site. Phylotrees are shown for isolates phenotypically resistant to (B) BDQ, (C) CFZ, (D) DLM, and (E) LZD. Tip point colours denote lineage. Each outer track represents a gene thought to be associated with resistance and coloured blocks denote the presence of a nonsynonymous mutation in the relevant gene for a given isolate. Mutations in these genes that are either associated with sensitivity or present in >5% of the collection of isolates as a whole were ignored. BDQ, bedaquiline; CFZ, clofazimine; DLM, delamanid; LZD, linezolid.

Fig 6

Fig 6. Rifampicin monoresistance.

(A) Percentage of RR isolates that are RMR by country of isolate origin. * indicates RMR proportions that were significantly different from that of the total dataset using a 2-tailed z-test with 95% confidence. (B) MDR predictions for RR isolates made using the Xpert MTB/RIF assay proxy. N is the total number of RR isolates. The inner ring shows the proportion of RR isolates that are RMR and MDR. The middle ring represents the proportions of RMR and MDR isolates that have an SNP (synonymous or nonsynonymous) in the RRDR of rpoB (“RRDR”), no RRDR SNP but a SNP elsewhere in the rpoB gene (“_rpoB_”), and no rpoB mutations (“none”). The outer ring shows the expected TP, TN, FP, and FN MDR predictions of Xpert MTB/RIF assay, based on the SNPs present in the RR isolates. (C) Nonsynonymous mutations found in the RRDR of rpoB in RMR isolates and MDR isolates. Presence of a coloured spot indicates that the mutation was found in RMR/MDR isolates, and spot size corresponds to the proportion of RMR or MDR isolates carrying that mutation. FN, false negative; FP, false positive; MDR, multidrug resistant; RRDR, rifampicin resistance–determining region; RMR, rifampicin monoresistant; RR, rifampicin resistant; SNP, single nucleotide polymorphism; TN, true negative; TP, true positive.

Similar articles

Cited by

References

    1. WHO. Global Tuberculosis Report 2020. 2021.
    1. World Health Organisation. EndTB Campaign. Available from: www.who.int/teams/global-tuberculosis-programme/the-end-tb-strategy.
    1. Shinnick TM, Starks AM, Alexander HL, Castro KG. Evaluation of the Cepheid Xpert MTB/RIF assay. Expert Rev Mol Diagn. 2015;15:9–22. doi: 10.1586/14737159.2015.976556 - DOI - PMC - PubMed
    1. Boehme CC, Nicol MP, Nabeta P, Michael JS, Gotuzzo E, Tahirli R, et al.. Feasibility, diagnostic accuracy, and effectiveness of decentralised use of the Xpert MTB/RIF test for diagnosis of tuberculosis and multidrug resistance: a multicentre implementation study. Lancet. 2011;377:1495–1505. doi: 10.1016/S0140-6736(11)60438-8 - DOI - PMC - PubMed
    1. Makhado NA, Matabane E, Faccin M, Pinçon C, Jouet A, Boutachkourt F, et al.. Outbreak of multidrug-resistant tuberculosis in South Africa undetected by WHO-endorsed commercial tests: an observational study. Lancet Infect Dis. 2018;18:1350–1359. doi: 10.1016/S1473-3099(18)30496-1 - DOI - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources