Genome-wide detection of single-nucleotide and copy-number variations of a single human cell - PubMed (original) (raw)

Genome-wide detection of single-nucleotide and copy-number variations of a single human cell

Chenghang Zong et al. Science. 2012.

Abstract

Kindred cells can have different genomes because of dynamic changes in DNA. Single-cell sequencing is needed to characterize these genomic differences but has been hindered by whole-genome amplification bias, resulting in low genome coverage. Here, we report on a new amplification method-multiple annealing and looping-based amplification cycles (MALBAC)-that offers high uniformity across the genome. Sequencing MALBAC-amplified DNA achieves 93% genome coverage ≥1x for a single human cell at 25x mean sequencing depth. We detected digitized copy-number variations (CNVs) of a single cancer cell. By sequencing three kindred cells, we were able to identify individual single-nucleotide variations (SNVs), with no false positives detected. We directly measured the genome-wide mutation rate of a cancer cell line and found that purine-pyrimidine exchanges occurred unusually frequently among the newly acquired SNVs.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests

CZ, SL and XSX are authors on a patent applied for by Harvard University that covers the MALBAC technology

Figures

Figure 1

Figure 1

MALBAC single cell whole genome amplification. A single cell is picked and lysed. First, genomic DNA of the single cell is melted into single-stranded DNA molecules at 94°C. MALBAC primers then anneal randomly to single-stranded DNA molecules at 0°C and are extended by a polymerase with displacement activity at elevated temperatures, creating semi-amplicons. In the following five temperature cycles, after the step of looping the full amplicons, single stranded amplicons and the genomic DNA are used as template to produce full amplicons and additional semi-amplicons, respectively. For full amplicons, the 3′ end is complementary to the sequence on the 5′ end. The two ends hybridize will form the looped DNA, which can efficiently prevents the full amplicon from being used as template, therefore warrant a close-to-linear amplification. After the five cycles of linear preamplification, only the full amplicons can be exponentially amplified in the following PCR using the common 27-nucleotide sequence as the primer. PCR reaction will generate microgram level of DNA material for sequencing experiments.

Figure 2

Figure 2

Characterization of amplification uniformity. (A) Histograms of reads over the entirety of Chromosome 1 of a single cell from the SW480 cancer cell line and the zoom-in of a ~8 million base region (chr1: 62,023,147–70,084,845). (B) Lorenz curves of MALBAC, MDA and bulk sample. A Lorenz curve gives the cumulated fraction of reads as a function of the cumulated fraction of genome. Perfectly uniform coverage would result in a diagonal line and a large deviation from the diagonal is indicative of a biased coverage. The green and blue arrows indicate the uncovered fractions of the genome for MALBAC and MDA respectively. All samples are sequenced at 25x depth. (C) Power spectrum of read density throughout the genome (as a function of spatial frequency). MALBAC performs similarly to bulk, while the MDA spectrum shows high amplitude at low frequency, demonstrating that regions of several megabases suffer from under- and over- amplification. This observation is consistent with the variations of read depth in Fig. S3 (SOM).

Figure 3

Figure 3

CNVs of single cancer cells. Digitized copy numbers across the genome are plotted for three single cells (Panel A to C) as well as the bulk sample (Panel D) from the SW480 cancer cell line. The bottom panel shows the result based on MDA amplification (Panel E). Green lines are fitted CNV numbers obtained from the hidden Markov model (SOM). The single cells are sequenced at only 0.8x depth, while the bulk and MDA are done at 25x. More single cells’ CNV analyses are included in the SOM (Fig. S4). The regions within the dashed box exhibit the CNV differences among single cells and the bulk, which cannot be resolved by MDA. The binning window is 200kb.

Figure 4

Figure 4

Calling newly acquired SNVs and estimation of mutation rate of a cancer cell line (SW480). (A) Experiment design. A single ancestor cell is chosen and cultured for ~20 generations. The vast majority of cells are used to extract DNA for bulk sequencing to represent the ancestor cell’s genome. A single cell from this culture is chosen for another expansion of four generations. The kindred cells are isolated for single cell whole genome amplification. Single cell sample C1, C2, and C3 are used for high-throughput sequencing. Sample C4, C5, and C6 are used for varying SNVs with Sanger sequencing. (B) 3D p-value plot of a one-sided binomial test for SNV candidates from the three kindred cells. The black dots are the false positives due to uncorrelated amplification errors; all of them are on the x-y-z axis and x-y, y-z, x-z planes. Outside of the three planes, the 166 green dots are the residual false positives due to correlated errors from homopolymers, tandem repeats, high-GC content and high density SNV regions, and the 35 red dots are the newly acquired SNVs during the 20 generations of clonal expansion (SOM). We note that the homozygous SNVs are located at the (1,1,1) position. (C) Locations of the 35 newly acquired SNVs on the chromosomes of a single cell (SOM). (D) Next-generation sequencing data of a newly acquired SNV. The SNV (C→G) exists in the high throughput data of all three kindred cells but not in the bulk data. (E) Sanger sequencing data of single cells C4, C5, and C6 confirms that this SNV is not a false positive, while the Sanger sequencing of the bulk confirms that this SNV is not a false negative of next generation sequencing of the bulk (i.e. this SNV is indeed absent in the bulk).

Comment in

Similar articles

Cited by

References

    1. Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002 Aug 16;297:1183. - PubMed
    1. Li GW, Xie XS. Central dogma at the single-molecule level in living cells. Nature. 2011 Jul 21;475:308. - PMC - PubMed
    1. Negrini S, Gorgoulis VG, Halazonetis TD. Genomic instability--an evolving hallmark of cancer. Nature reviews Molecular cell biology. 2010 Mar;11:220. - PubMed
    1. Lengauer C, Kinzler KW, Vogelstein B. Genetic instabilities in human cancers. Nature. 1998 Dec 17;396:643. - PubMed
    1. Yachida S, et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature. 2010 Oct 28;467:1114. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources