Genomic Analysis in the Age of Human Genome Sequencing - PubMed (original) (raw)

Review

Genomic Analysis in the Age of Human Genome Sequencing

Tuuli Lappalainen et al. Cell. 2019.

Abstract

Affordable genome sequencing technologies promise to revolutionize the field of human genetics by enabling comprehensive studies that interrogate all classes of genome variation, genome-wide, across the entire allele frequency spectrum. Ongoing projects worldwide are sequencing many thousands-and soon millions-of human genomes as part of various gene mapping studies, biobanking efforts, and clinical programs. However, while genome sequencing data production has become routine, genome analysis and interpretation remain challenging endeavors with many limitations and caveats. Here, we review the current state of technologies for genetic variant discovery, genotyping, and functional interpretation and discuss the prospects for future advances. We focus on germline variants discovered by whole-genome sequencing, genome-wide functional genomic approaches for predicting and measuring variant functional effects, and implications for studies of common and rare human disease.

Copyright © 2019 Elsevier Inc. All rights reserved.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

The general framework of genome analysis in studies of human phenotypes, with the areas discussed in this review highlighted in blue.

Figure 2.

Figure 2.

Overview of genome sequencing and variant detection approaches. The experimentally sequenced "test" genome contains two heterozygous SNVs, each located on a different chromosome (blue and red stars), one homozygous SNV (green stars), and a heterozygous deletion (dashed line). Reference alleles are represented by solid lines and black stars. The pan-genome graph representation at right requires prior knowledge of all shown variants.

Figure 3.

Figure 3.

Allele frequency of SNVs from the gnomAD database (

http://gnomad.broadinstitute.org

). (A) Density plot showing the minor allele frequency (MAF) distribution, known as the "site frequency spectrum". (B) Cumulative distribution function of the site frequency spectrum, showing the fraction of variants (y-axis) with a frequency smaller than a given MAF (x-axis). Note that the left-most data point represents "singleton" variants present in only one person. These plots are based on a randomly sampled subset of ~19 million SNVs from gnomAD version 2.0, which in total includes ~188 million SNVs from 15,496 genomes.

Figure 4.

Figure 4.

Annotation of genetic variants according to their type, position and downstream effects for SNVs and indels (A) and structural variants (B). The annotations include the most commonly used ones, and the potential effects on protein are shown here in an approximate sense, asking the question "if a variant with a given annotation has any effect on gene function, what are the most likely processes". The downstream effect indicates the change on protein’s function in the cell. This illustration highlights the complexity challenge of understanding even the proximal molecular effects of diverse types of genetic variants and building biologically and medically meaningful understanding of their downstream effects.

Figure 5.

Figure 5.. Functional Annotation and Downstream Consequences of Structural Variants

Annotation of genetic variants according to their type, position, and downstream effects for structural variants.

Figure 6.

Figure 6.

Illustration of the approaches to interpret molecular effects of genetic variants. (A) Straightforward overlap with tissue-specific annotations of genes and regulatory elements as well as genome constraint scores; (B) Predictive machine learning models of variant function; (C) Mapping of common variant associations to molecular phenotypes (left side) and their colocalization with GWAS associations; (D) Interrogating if a rare variant carrier is also an outlier with respect to a proximal molecular phenotype.

References

    1. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, et al. (2015). A global reference for human genetic variation. Nature 526, 68–74. - PMC - PubMed
    1. Abel HJ, Larson DE, Chiang C, Das I, Kanchi K, Layer RM, Neale BM, Salerno WJ, Reeves C, Buyske S, et al. (2018). Mapping and characterization of structural variation in 17,795 deeply sequenced human genomes. https://wwwbiorxivorg/content/early/2018/12/31/508515. - PMC - PubMed
    1. Alipanahi B, Delong A, Weirauch MT, and Frey BJ (2015). Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33, 831–838. - PubMed
    1. Alkan C, Coe BP, and Eichler EE (2011). Genome structural variation discovery and genotyping. Nature reviews Genetics 12, 363–376. - PMC - PubMed
    1. Audano PA, Sulovari A, Graves-Lindsay TA, Cantsilieris S, Sorensen M, Welch AE, Dougherty ML, Nelson BJ, Shah A, Dutcher SK, et al. (2019). Characterizing the Major Structural Variant Alleles of the Human Genome. Cell 176, 1–13. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources