Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing - PubMed (original) (raw)

Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing

Margaret L Hoang et al. Proc Natl Acad Sci U S A. 2016.

Abstract

We present the bottleneck sequencing system (BotSeqS), a next-generation sequencing method that simultaneously quantifies rare somatic point mutations across the mitochondrial and nuclear genomes. BotSeqS combines molecular barcoding with a simple dilution step immediately before library amplification. We use BotSeqS to show age- and tissue-dependent accumulations of rare mutations and demonstrate that somatic mutational burden in normal human tissues can vary by several orders of magnitude, depending on biologic and environmental factors. We further show major differences between the mutational patterns of the mitochondrial and nuclear genomes in normal tissues. Lastly, the mutation spectra of normal tissues were different from each other, but similar to those of the cancers that arose in them. This technology can provide insights into the number and nature of genetic alterations in normal tissues and can be used to address a variety of fundamental questions about the genomes of diseased tissues.

Keywords: aging; genomics; next-generation sequencing; somatic mutation.

PubMed Disclaimer

Conflict of interest statement

B.V. has no conflict of interest with respect to the new technology described in this manuscript, as defined by Johns Hopkins University's policy on conflict of interest. B.V. is a founder of PapGene and Personal Genome Diagnostics and a member of the Scientific Advisory Boards of Morphotek and Syxmex-Inostics. These companies and others have licensed patent applications on genetic technologies from Johns Hopkins, some of which result in royalty payments to B.V. The terms of these arrangements are being managed by Johns Hopkins University in accordance with its conflict of interest policies.

Figures

Fig. 1.

Fig. 1.

Bottleneck sequencing methodology. Each color at the top of the figure represents double-stranded DNA from a genome of one cell within a population. Random, nonclonal point mutations (red) are private to individual cells. In contrast, clonal reference changes (A in black) are present in all genomes within the cell population. (step 1) Random shearing generates variably sized DNA molecules. (step 2) Noncomplementary single-stranded regions of the Illumina Y-adapters (P5 in gray and P7 in black) are represented as forked structures ligated to both ends of each DNA molecule. (step 3) Dilution decreases the number of DNA molecules (five are shown) from the original population in a random manner. Ends of the DNA molecules align uniquely to the reference genome. Mapping coordinates are used as unique molecule “barcodes” during data processing. (step 4) PCR primer (black arrowhead) anneals and primer extends (hashed lines) the Watson and Crick template of the original DNA molecule independently. The red asterisk represents an error generated during PCR of the library. (step 5) Watson and Crick templates generate two families of PCR duplicates. Orientation of P5 (gray) and P7 (black) containing adapters to the DNA molecule (insert) distinguishes the two families. P5 and P7 sequences dictate which end will be sequenced in read 1 vs. read 2, respectively, on the Illumina flow cell. Red asterisks represent the PCR error propagated in the Watson but not the Crick family members. In contrast to artifacts, real mutations (C:G mutation in red) will be present in both the Watson and Crick family members. (step 6) The BotSeqS pipeline identifies and quantifies the number of unique DNA molecules and point mutations (C:G in red) in the sequencing data by eliminating artifacts and clonal changes (A:T in black).

Fig. 2.

Fig. 2.

Nuclear point mutations increase in normal tissues from individuals with defects in DNA repair or with exposure to environmental carcinogens compared with controls. (A) Comparison of point mutation prevalences in nuclear (Left) and mitochondrial (Right) genome in age-matched normal colon epithelium (filled circle) with different DNA mismatch repair genotypes (PMS2 +/+ or PMS2 −/−) or in age-matched normal kidney cortex (filled square) without (none) or with (aristolochic acid or smoking) carcinogen exposure. Red lines represent average. *P < 0.05, _t_ test; **_P_ < 0.001 and ***_P_ < 0.0001, one-way ANOVA with Bonferroni multiple comparison posttest; ns, not significant, indicates _P_ > 0.05. (B) Stacked columns representing the substitution frequencies (y axis) of each substitution out of the six possible types (see legend). Cohort labels are indicated in A directly above each column. Number of substitutions (N) generating each mutational spectrum is indicated on the x axis. n.d., not determined due to an insufficient number of mutations (N = 7) for mutational spectrum analysis. *P = 0.04, Fisher’s exact test; **P = 2.6 × 10−8 and ***P = 1.5 × 10−16, Fisher’s exact test with Bonferroni multiple comparison correction; ns, not significant, indicates P > 0.05. All statistical tests in this figure were two-tailed.

Fig. 3.

Fig. 3.

Normal human tissues accumulate point mutations over a lifetime with genome-specific and tissue-specific mutational patterns. Point mutation prevalences in nuclear (Top) and mitochondrial (Bottom) genome measured in four normal tissue types (brain frontal cortex of 9 individuals, kidney cortex of 5 individuals, colon epithelium of 11 individuals, and duodenum of 1 individual). Twenty-six total individuals were assessed, with each individual contributing to one normal tissue type. Pie chart Insets show the prevalences of each substitution out of the six possible substitution types (see pie chart legend, right side). Each pie chart was compiled from the individuals represented in their respective scatter plots, with the exception that duodenum was omitted. The number of substitutions generating the pie charts for the nuclear genome was n = 31 for brain, n = 73 for kidney, and n = 94 for colon, and for the mitochondrial genome was n = 181 for brain, n = 299 for kidney, and n = 116 for colon.

Similar articles

Cited by

References

    1. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458(7239):719–724. - PMC - PubMed
    1. Kennedy SR, Loeb LA, Herr AJ. Somatic mutations in aging, cancer and neurodegeneration. Mech Ageing Dev. 2012;133(4):118–126. - PMC - PubMed
    1. Vijg J. Somatic mutations, genome mosaicism, cancer and aging. Curr Opin Genet Dev. 2014;26:141–149. - PMC - PubMed
    1. Ross MG, et al. Characterizing and measuring bias in sequence data. Genome Biol. 2013;14(5):R51. - PMC - PubMed
    1. Albertini RJ, Nicklas JA, O’Neill JP, Robison SH. In vivo somatic mutations in humans: Measurement and analysis. Annu Rev Genet. 1990;24:305–326. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources