The genomic and transcriptomic landscape of a HeLa cell line - PubMed (original) (raw)

. 2013 Aug 7;3(8):1213-24.

doi: 10.1534/g3.113.005777.

Paul Theodor Pyl, Tobias Rausch, Thomas Zichner, Manu M Tekkedil, Adrian M Stütz, Anna Jauch, Raeka S Aiyar, Gregoire Pau, Nicolas Delhomme, Julien Gagneur, Jan O Korbel, Wolfgang Huber, Lars M Steinmetz

Affiliations

The genomic and transcriptomic landscape of a HeLa cell line

Jonathan J M Landry et al. G3 (Bethesda). 2013.

Abstract

HeLa is the most widely used model cell line for studying human cellular and molecular biology. To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome. Effective design and interpretation of molecular genetic studies performed using HeLa cells require accurate genomic information. Here we present a detailed genomic and transcriptomic characterization of a HeLa cell line. We performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile. Segmentation of the genome according to copy number revealed a remarkably high level of aneuploidy and numerous large structural variants at unprecedented resolution. Some of the extensive genomic rearrangements are indicative of catastrophic chromosome shattering, known as chromothripsis. Our analysis of the HeLa gene expression profile revealed that several pathways, including cell cycle and DNA repair, exhibit significantly different expression patterns from those in normal human tissues. Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins. This study underscores the importance of accounting for the strikingly aberrant characteristics of HeLa cells when designing and interpreting experiments, and has implications for the use of HeLa as a model of human biology.

Keywords: HeLa cell line; genomics; resource; transcriptomics; variation.

PubMed Disclaimer

Figures

Figure 1

Figure 1

The genomic landscape of a HeLa cell line. (A) Circos plot (Krzywinski et al. 2009) of the HeLa genome with tracks representing read depth (100 kb-binned coverage), CN (color gradient from light green for CN1 to dark red for CN10), zygosity (pink: heterozygous; purple: homozygous), SNV density (1-Mb binned SNV count; darker blue for greater density), and translocation calls (colored arcs based on paired-end sequencing data: light blue; mate pair data: light green; both datasets: orange). (B) Histogram of called CN across the genome in percent. CN 0 corresponds to coverage less than half of the expected value for CN 1. A CN value of “NA” means no call could be made with confidence ≥0.95 (see Materials and Methods). (C) Overview of sequence variation in HeLa. Numbers of SNV and indel calls in HeLa, classified by overlap with dbSNP and the 1000 Genomes Project (dbSNP137). The y-axis shows the counts on a logarithmic scale. The four different classes of events represented on the x-axis are homozygous (“Hom.”) and heterozygous (“Het.”) SNVs and indels. (D) Variation observed in HeLa protein-coding genes relative to the human reference. Number of protein-coding genes containing SNVs, nonsynonymous SNVs, and damaging non-synonymous mutations [predicted by SIFT (Ng and Henikoff 2003)].

Figure 2

Figure 2

SVs, CN, zygosity and allele frequency along chromosomes 3 and 11. Arcs in the top panels labeled “Events” represent the predicted connections between fragments derived from SV calls based on read pair orientation and spacing. Different read pair signatures indicate the following event types: deletions, tandem duplications, inversions, and interchromosomal translocations. The center panel (“Copy Number”) represents the CN estimates in 10-kb bins (gray) overlaid with their segmentation (black). The associated CN is shown on the y-axis. The zygosity track shows the proportion of homozygous SNV calls in 10-kb bins; darker purple regions contain more homozygous calls (up to 100%) and indicate potential LOH. The bottom panel shows the allele frequency distribution as a heatmap in 10-kb bins on the chromosome axis and 5% bins on the allele frequency axis; darker blue indicates more SNVs with the given allele frequency in the corresponding 10 kb region. The color scale is according to the log of proportion of SNVs falling into the allele frequency bin (e.g., 10–15%, i.e., the row) in the 10 kb region (i.e., the column). The chromosomal subregion 11q13, which is known to contain tumor-suppressor genes, is delineated with black bars.

Figure 3

Figure 3

Colored HeLa karyotype by M-FISH. M-FISH results of 12 analyzed metaphase spreads identified a hypotriploid karyotype. The karyotype shown in (A) was derived from a single cell in which all aberrations were recurrent except for the one in chromosome 3. Single cell-specific events are shown in (B).

Figure 4

Figure 4

A general lack of dosage compensation observed in HeLa gene expression. (A) Correlation between CN and gene expression levels. Empirical cumulative distribution functions of gene expression values (for genes detected as expressed), grouped by CN state of the region containing the gene. The x-axis shows the logarithm (base 10) of read counts per kb of gene and the y-axis shows the corresponding cumulative distribution function. Significance (*) was calculated by the Wilcoxon test (P < 0.01). (B) Lack of allele-specific dosage compensation. For each SNV in genome segments of CN 3, the higher and lower RNA-Seq read count for both alleles are shown (higher count on the y-axis, lower count on the x-axis). The two dashed lines represent ratios 2:1 and 1:1. The observed ratios center around the 2:1 line, indicating an overall lack of allele-specific dosage compensation.

Comment in

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium, G. R. Abecasis, A. Auton, L. D. Brooks, M. A. DePristo et al., 2012 An integrated map of genetic variation from 1,092 human genomes. Nature 490: 56–65. - PMC - PubMed
    1. Affymetrix ENCODE Transcriptome Project & Cold Spring Harbor Laboratory ENCODE Transcriptome Project , 2009. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 457: 1028–1032 - PMC - PubMed
    1. Aït Yahya-Graison E., Aubert J., Dauphinot L., Rivals I., Prieur M., et al. , 2007. Classification of human chromosome 21 gene-expression variations in Down syndrome: impact on disease phenotypes. Am. J. Hum. Genet. 81: 475–491 - PMC - PubMed
    1. Alekseev O. M., Richardson R. T., Alekseev O., O’Rand M. G., 2009. Analysis of gene expression profiles in HeLa cells in response to overexpression or siRNA-mediated depletion of NASP. Reprod. Biol. Endocrinol. 7: 45. - PMC - PubMed
    1. Anders S., Huber W., 2010. Differential expression analysis for sequence count data. Genome Biol. 11: R106. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources