Whole-genome sequencing of nine esophageal adenocarcinoma cell lines (original) (raw)

Abstract

Esophageal adenocarcinoma (EAC) is highly mutated and molecularly heterogeneous. The number of cell lines available for study is limited and their genome has been only partially characterized. The availability of an accurate annotation of their mutational landscape is crucial for accurate experimental design and correct interpretation of genotype-phenotype findings. We performed high coverage, paired end whole genome sequencing on eight EAC cell lines—ESO26, ESO51, FLO-1, JH-EsoAd1, OACM5.1 C, OACP4 C, OE33, SK-GT-4—all verified against original patient material, and one esophageal high grade dysplasia cell line, CP-D. We have made available the aligned sequence data and report single nucleotide variants (SNVs), small insertions and deletions (indels), and copy number alterations, identified by comparison with the human reference genome and known single nucleotide polymorphisms (SNPs). We compare these putative mutations to mutations found in primary tissue EAC samples, to inform the use of these cell lines as a model of EAC.

Keywords: Esophageal adenocarcinoma, whole genome sequencing, cell line, high-grade dysplasia, cancer genome, copy number alteration, single nucleotide variant

Introduction

Esophageal adenocarcinoma (EAC), including cancers of the gastro-esophageal junction, represent a substantial health concern in Western countries due to its increasing incidence and poor prognosis. To date, there are no widely accepted animal models for EAC and a limited number of cell lines are all that are available for_in vitro_ functional studies. Recent genome-wide sequencing projects have shown that EAC is one of the most highly mutated solid cancers with a high degree of heterogeneity (Dulak_et al._, 2013;Weaver_et al._, 2014). In addition to point mutations there are also widespread copy number alterations with evidence of catastrophic events such as chromothripsis and bridge fusion breakages in about one-third of cases (Nones_et al._, 2014). An accurate annotation of the mutational landscape of available EAC cell lines is therefore crucial for optimal experimental design, interpretation of genotype-phenotype data and to analyse drug sensitivities. We selected eight EAC cell lines—ESO26, ESO51, FLO-1, JH-EsoAd1, OACM5.1 C, OACP4 C, OE33, SK-GT-4—the identities of which have been verified by short tandem repeat (STR) analysis, p53 mutation and xenograft histology against the original tumors (Boonstra_et al._, 2010), and one esophageal high grade dysplasia (CP-D) cell line. We performed high-coverage paired-end whole genome sequencing and aligned the sequence data to the human reference genome in order to detect single nucleotide variants, indels and copy number alterations.

Materials and methods

Ethics

Cell lines were obtained through commercially available repositories except JH-EsoAd1, which was a kind gift from Hector Alvarez (Table 1).

Table 1. Characteristics and clinico-pathological features of the EAC cell lines analysed.

Verified origin identifies cell lines whose pathological origin from EAC has been verified inBoonstra_et al._, 2010.

Cell line AlternativeNames Age Sex Ethnicity Histology DateDerived Stage Ploidy CommercialAvailability Verifiedorigin Ref
CP-D CP-18821 Adult M hTERT immortalizedoesophageal HGD 1995 HGD hypoyhetraploid ATCC Palanca-Wessels et al.,1998
ESO26 56 M Caucasian GOJadenocarcinoma 2000 Stage IV hypodiploid (1.8) Public HealthEngland –CultureCollection YES Boonstra_et al._, 2010
ESO51 74 M Caucasian Distal OesophagealAdenocarcinoma 2000 Stage IV hypotriploid (2.75) Public HealthEngland –CultureCollection YES Boonstra_et al._, 2010
FLO-1 68 M Caucasian Distal OesophagealAdenocarcinoma 1991 hypodiploid (1.9) Public HealthEngland –CultureCollection YES Hughes_et al._, 1997
JH-EsoAd1 JHAD1 66 M Caucasian Moderately topoorly differentiatedOesophagealAdenocarcinoma 1997 Stage IIA(T3 N0 M0) triploid No, due to bedeposited in ATCC YES Alvarez_et al._, 2008
OACM5.1C 47 F Caucasian Lymph nodemetastases ofDistal OesophagealAdenocarcinoma 2001 Stage IV hypodiploid Public HealthEngland –CultureCollection YES de Both_et al._, 2001
OACP4 C 55 M Caucasian Gastric cardiaadenocarcinoma 2001 Stage IV Aneuploidy (53–57chromosomes) Public HealthEngland –CultureCollection YES de Both_et al._, 2001
OE33 JROECL33 73 F Distal OesophagealAdenocarcinoma 1993 Stage IIA hypotetraploid (3.5) Public HealthEngland –CultureCollection YES Rockett_et al._, 1997
SK-GT-4 83 M Distal OesophagealAdenocarcinoma 1989 Stage IIB Aneuoplid (mode 59chromosomes, SK Public HealthEngland –CultureCollection YES Altorki_et al._, 1993

Cell lines

All cell lines were from a certified source (Table 1) and verified in house for >90% match with publicly reported STR profiles. Cell lines were mycoplasma tested and grown in standard conditions reported in cell repositories indicated inTable 1. Matched germline DNA was not available.

Library preparation, sequencing and QC

Genomic DNA was prepared from cultured cells with AllPrepDNA/RNA Mini Kit (Qiagen) according to manufacturer’s instructions. A single library was created for each sample, and 90-bp paired-end sequencing was performed at Beijing Genomic Institute (BGI, Guangdong, China) according to Illumina (Ca, USA) instructions to a typical depth of 30×, with 94% of the known genome being sequenced to at least 10× coverage and achieving a Phred quality of 30 for at least 80% of mapping bases. FastQC 0.11.2 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) was used to assess the quality of the sequence data. Additional alignment, duplication and insert size metrics quality metrics are reported inSupplementary material 7. Sequence reads were mapped to the human reference genome (Ensembl GRCh37, release 84) using BWA 0.5.9 (Li, 2009), sorted into genome coordinate order and duplicates marked using Picard 1.105 (FixMateInformation and MarkDuplicates tools respectively,http://broadinstitute.github.io/picard). Original BAM files are available in the European Bioinformatics Institute (EBI) repository (project: PRJEB14018; sample accessions: ERS1158075-ERS1158083).

Mutation calling

GATK v3.2.2 (Broad Institute, MA, USA) was used to call and filter single nucleotide and indel variants compared to the reference genome. In brief, the steps run were as follows: 1) local realignment of reads to correct misalignments around indels using GATK RealignerTargetCreator and IndelRealigner tools; 2) recalibration of base quality scores using GATK BaseRecalibrator tool; 3) SNV and indel calling using GATK HaplotypeCaller which determines haplotype by re-assembly within regions determined to be active, i.e. where there is evidence for a variation, and uses a Bayesian approach to assign genotypes. Hard filters were applied to the resulting call set using recommendations available from the GATK documentation (https://www.broadinstitute.org/gatk) to generate a high-confidence set of SNV and indel calls. These were analyzed with Ensembl Variant Effect Predictor (release 75,http://www.ensembl.org/info/docs/tools/vep/index.html) to annotate with genomic features and consequences of protein coding regions (Supplementary material 4). For the purposes of the analysis, all variants with global minor allele frequency (GMAF) >0.0014 described in the 1000 Genomes project were separated out as likely germline polymorphisms (The 1000 Genomes Project Consortium_et al._, 2012) according to the criteria adopted in the Cosmic Cell Lines Project (Wellcome Trust Sanger Institute, Cambridge). Further, we removed all SNPs that have a minor allele frequency in the DBSNP (Ensembl v.58) and variants with a frequency ≥0.00025 in the ESP6500 (NHLBI GO Exome Sequencing Project, released June 20th 2012). A full list of the filtered variants is available inSupplementary material 4 andSupplementary material 6.

Copy number assessment

Copy number (CN) analysis was carried out using Control-FREEC (Boeva_et al._, 2012). Control-FREEC computes and segments CN profiles and is capable of characterizing over-diploid genomes, taking into consideration the CG-content and mapability profiles to normalize read count in the absence of a control sample. Ploidy in each cell line was assessed interactively with the Crambled app v.2.0 according to the methods described byLynch (2015).

Dataset validation

Whole genome sequencing

We identified a median of 1.3×105 variants across all 9 cell lines (range 105,487–151,879;Figure 1a,Table 2,Supplementary material 3,Supplementary material 4). We found that 1,5% of the variants were in coding regions; additionally, 4% fell in surrounding gene regions (i.e. regulatory as defined inZerbino_et al._ (2015), upstream and downstream regions), 41% in introns and 23% in intergenic regions. Among the variants in the coding sequence, the majority, 57.4%, were in the UTR regions, followed by exonic missense and synonymous variants (21% and 11% respectively (Figure 1,Table 2,Supplementary material 3,Supplementary material 4). The number of variations identified in the high-grade dysplasia CP-D line was not significantly lower to the median of other EAC cell lines, consistent with the finding that such pre-malignant lesions have already accumulated many SNVs (Weaver_et al._, 2014). OACP4C and ESO26 showed the smallest and largest number of variants, respectively. (Figure 1,Table 2).

Figure 1. Distribution of detected variants and coding sequence consequences (mean percentage value).

Figure 1.

A) Bar chart showing the distribution of called variants across various regions of the genome as indicated;B) Details of the coding sequence variants identified by the Variant Effect Predictor (Ensembl) expressed as a mean percentage value of all cell lines (values were not statistically different among samples).

Table 2. Detailed distribution of identified variants for each cell lines.

Absolute number, median, median absolute deviation and range interval are listed for each category of mutation according to Variant Effect Predictor classification (Ensembl).

CP-D ESO26 ESO51 FLO-1 JH-EsoAD1 OACM5.1 OACP4C OE33 SK-GT-4 Median MedianAbsoluteDeviation Min Max
Coding variants (type) UTR 5 prime UTR 229 301 262 191 206 264 229 216 305 229 33 191 305
3 prime UTR 979 1097 1002 926 929 1026 848 986 1113 986 57 848 1113
Start/Stop initiator codon 1 3 2 2 3 2 1 0 1 2 1 0 3
stop lost 2 2 4 2 2 2 3 3 2 2 0 2 4
stop retained 2 1 4 2 2 1 2 2 2 2 0 1 4
stop gained 10 14 17 16 14 17 9 14 24 14 3 9 24
Missense missense 385 496 497 436 435 481 431 446 454 446 15 385 497
Splice Sites spliceacceptor 4 11 7 8 11 11 9 7 7 8 1 4 11
splice donor 5 7 6 10 6 9 6 5 18 6 1 5 18
splice region 105 113 107 92 96 95 83 103 102 102 6 83 113
FrameshiftINDEL frameshift 42 52 41 45 34 34 49 46 54 45 4 34 54
In FrameINDEL inframedeletion 11 10 15 18 15 14 10 15 20 15 3 10 20
inframeinsertion 10 17 19 8 14 10 11 8 16 11 3 8 19
Synonymous 199 278 284 259 221 283 202 208 242 242 36 199 284
Other 1 1 1 0 1 1 1 1 1 1 0 0 1
Non coding variants (regions) Geneboundaries downstream 19197 20411 18927 18009 17711 19363 16202 18463 20318 18927 918 16202 20411
upstream 19197 20761 19332 18122 18196 20182 16825 18944 21239 19197 1001 16825 21239
Intergenic 29694 38091 34040 31999 27269 31875 21550 32985 33380 31999 2041 21550 38091
Introns 55372 61682 56671 54869 51163 56193 43210 55945 61374 55945 1076 43210 61682
Non-codingtranscripts MaturemiRNA 8 13 6 6 5 10 5 8 4 6 2 4 13
non-codingtranscript 1 2 1 1 1 1 0 0 1 1 0 0 2
non codingtranscriptexon 2149 2200 2116 1868 1920 2113 1811 2095 2310 2113 87 1811 2310
Regulatoryregions TF bindingsite 404 453 469 431 413 500 408 440 486 440 29 404 500
regulatoryregion 4667 5863 5301 4686 4512 5011 3582 4778 6158 4778 266 3582 6158
132674 151879 139131 132006 123179 137498 105487 135718 147631 135718 3712 105487 151879

A limitation of this study is represented by the lack of an available normal counterpart. In order to overcome this problem, in addition to the GATK calling pipeline we have applied a series of filters according to the criteria reported in methods and derived the 1000 Genomes Project (The 1000 Genomes Project Consortium_et al._, 2012), DBSNP (Ensembl v.58) and ESP6500 (released June 20th 2012). This approach reduced the number of variants by an order of magnitude from the original GATK pipeline (from a median of 4.1×106 to 1.3×105). Yet, the abundance of called variants compared to a range of 4,8×103-6×104 reported in human EAC (Weaver_et al._, 2014), may indicate that a proportion of the variants called in our final annotation are of germline origin. Also, additional mutations may have accumulated_in vitro_. A comprehensive annotation of the coding sequence variants identified is reported inSupplementary material 3 andSupplementary material 4.

Analysis of putative EAC driver genes

In order to investigate how closely cell lines reflect the spectrum of mutations observed in human specimens we analysed the mutational landscape of known cancer and putative EAC driver genes and compared to the previously reported mutation rate (Dulak_et al._, 2013;Weaver_et al._, 2014;Figure 2b & 2c). 69% of EACs have TP53 mutations (Weaver_et al._, 2014), while all cell lines carried at least one deleterious TP53 mutation. A SMAD4 mutation was present in 2 of 9 cell lines, ESO26 and JH-EsoAd, consistent with the 13% observed in EAC (Weaver_et al._, 2014). We were not able to identify mutations in ARID1A (affected by UTR variants in 1 of 9 cell lines) that is reportedly mutated in about 10% of cases of EAC specimens. Only some of the missense variants in the genes shown inFigure 2b resulted in known pathogenic mutations (i.e. TP53, PIK3CA, and TLR4). Other genes harboured benign or likely benign variants and/or variants with uncertain functional significance.

Figure 2. Analysis SNV and CNA of putative EAC genes identified inDulak_et al._ (2013) andWeaver_et al._ (2014).

Figure 2.

A) Log ratio of copy number status of the selected genes computed with Control-Freec (green indicates CN gain and red CN loss). Genome wide CN for each line is available inSupplementary material 1 andSupplementary material 3.B) SNVs identified by our pipelines and annotated by Variant Effect Predictor analysis (Ensembl). When more than one variant was present in a single gene, the most deleterious was annotated according to the color-coded legend reported at the bottom of the figure. A complete annotation of identified SNV are available in theSupplementary material 2.C) Blue and red bars indicate the mutation rate of EAC genes reported inDulak_et al._, 2013; andWeaver_et al._, 2014, respectively.

We expanded our analysis to other cancer genes of potential relevance to OAC. We identified a pathogenic KRAS mutation in SKGT4, and a missense mutation of uncertain significance in MET (OE33), EGFR (CP-D, ESO26, IH-EsoAd1). Among DNA repair genes all cell lines carry benign missense variants of ATM and missense variants of uncertain significance in BRCA2. MSH2 is affected by a missense variant in SKGT4, splice site variants in CP-D, JH-EsoAd1, and UTR variants in ESO51 and OACP4 C (Supplementary material 3,Supplementary material 4,Supplementary material 6). Copy number analysis (Supplementary material 1,Supplementary material 2) identified recurrent amplifications in ERBB2, MYC, MET and SEMA5A, and deletions in SMAD4, CDKN2A, CCDC102B and SMARCA4.

This sequencing data will enable the research community to undertake and interpret further analyses (reviewed inSupplementary material 5) and to inform the use of these cell lines as a model of EAC. Our data highlight the need to develop additional_in vitro_ models that have a germline reference genome to identify clearly the somatic changes (Gazdar_et al._, 1998). A larger number of cell lines might also more closely recapitulate the range of mutations observed in human disease.

Data availability

The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2016 Contino G et al.

BAM files are available at the European Nucleotide Archive (ENA, EMBL-EBI,www.ebi.ac.uk/ena, Study PRJEB14018). Accession numbers: CP-D ERS1158083; SK-GT-4 ERS1158082; OE33 ERS1158081; OACP4 C ERS1158080; OACM5.1 ERS1158079; JH-EsoAd1 ERS1158078; FLO-1 ERS1158077; ES051 ERS1158076; ES026 ERS1158075.

Funding Statement

This work was funded by an MRC Programme Grant to R.C.F. and a Cancer Research UK grant to PAWE. The pipeline for mutation calling is funded by Cancer Research UK as part of the International Cancer Genome Consortium. G.C. is a National Institute for Health Research Lecturer as part of a NIHR professorship grant to R.C.F. AGL is supported by a Cancer Research UK programme grant (C14303/A20406) to Simon Tavaré and the European Commission through the Horizon 2020 project SOUND (Grant Agreement no. 633974).

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 1; referees: 3 approved]

Supplementary material

Supplementary material 1. A) Copy Number Alteration of EAC cell lines according to ploidy shown by FREEC plots (loss, normal, and gain are indicated in blue, green and red, respectively). Genes annotated in red are the genes of the Cancer Genes Cosmic Census that fall in the amplified regions defined as copy number ≥5 for diploid and ≥7 for triploid and tetraploid cell lines. Genes annotated in blue are genes of the Cancer Genes Cosmic Census that fall in deleted regions with CN ≤1.B) Tables reporting all the genes of the Cancer Genes Cosmic Census that falls in deleted or amplified regions according to FREEC. Cell lines are shown in the following order 1) CP-D, 2) ESO26, 3) ESO51, 4) FLO-1, 5) JH-EsoAd1, 6) OACM5.1 C, 7) OACP4 C, 8) OE33, 9) SK-GT-4.

.

Supplementary material 2. FREEC output of CNV by chromosome of the analysed cell lines. CNV of each cell line is indicated by chromosome consistently to known ploidy and_in silico_ verification with the_Crambled App_ (Lynch_et al._, 2015).

.

Supplementary material 3. Effect Predictor Analysis annotated VCF files of GAKT called variants for CP-D, ESO26, ESO51, FLO-1, JH-EsoAd1, OACM5.1 C, OACP4 C, OE33, SK-GT-4 are available for download at the EMBL-EBI European Variation Archive (EVA,http://www.ebi.ac.uk/eva/) under the study PRJEB14018).

Supplementary material 4. Filtered variants: 1) CP-D, 2) ESO26, 3) ESO51, 4) FLO-1, 5) JH-EsoAd1, 6) OACM5.1 C, 7) OACP4 C, 8) OE33, 9) SK-GT-4.

.

Supplementary material 5.. Publicly Available datasets for analysed cell lines. For each cell line, currently available datasets from COSMIC, the Broad-Novartis Cancer Cell Line Encyclopaedia, and GEO (Gene Expression Omnibus) are listed.

Supplementary material 6.. Gitools readable file containing mutation calls for all genes. When more than one variant was present in a single gene, the most deleterious was annotated according to the color-coded legend reported at the bottom of the figure. Gitools is freely available for download atwww.gitools.org (Perez-Llamas & Lopez-Bigas, 2011).

.

Supplementary material 7. Alignment, duplication and insert size metrics for each cell line.

.

References

  1. Altorki N, Schwartz GK, Blundell M, et al. : Characterization of cell lines established from human gastric-esophageal adenocarcinomas. Biologic phenotype and invasion potential._Cancer._1993;72(3):649–57. [DOI] [PubMed] [Google Scholar]
  2. Alvarez H, Koorstra JB, Hong SM, et al. : Establishment and characterization of a bona fide Barrett esophagus-associated adenocarcinoma cell line._Cancer Biol Ther._2008;7(11):1753–5. 10.4161/cbt.7.11.6723 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Boeva V, Popova T, Bleakley K, et al. : Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data._Bioinformatics._2012;28(3):423–5. 10.1093/bioinformatics/btr670 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boonstra JJ, van Marion R, Beer DG, et al. : Verification and unmasking of widely used human esophageal adenocarcinoma cell lines._J Natl Cancer Inst._2010;102(4):271–4. 10.1093/jnci/djp499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. de Both NJ, Wijnhoven BP, Sleddens HF, et al. : Establishment of cell lines from adenocarcinomas of the esophagus and gastric cardia growing_in vivo_ and_in vitro_._Virchows Arch._2001;438(5):451–6. 10.1007/s004280000358 [DOI] [PubMed] [Google Scholar]
  6. Dulak AM, Stojanov P, Peng S, et al. : Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity._Nat Genet._2013;45(5):478–86. 10.1038/ng.2591 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gazdar AF, Kurvari V, Virmani A, et al. : Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer._Int J Cancer._1998;78(6):766–74. [DOI] [PubMed] [Google Scholar]
  8. Hughes SJ, Nambu Y, Soldes OS, et al. : Fas/APO-1 (CD95) is not translocated to the cell membrane in esophageal adenocarcinoma._Cancer Res._1997;57(24):5571–8. [PubMed] [Google Scholar]
  9. Lynch A:Crambled: A Shiny application to enable intuitive resolution of conflicting cellularity estimates [version 1; referees: 2 approved]._F1000Res._2015;4:1407. 10.12688/f1000research.7453.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Nones K, Waddell N, Wayte N, et al. : Genomic catastrophes frequently arise in esophageal adenocarcinoma and drive tumorigenesis._Nat Commun._2014;5:5224. 10.1038/ncomms6224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Palanca-Wessels MC, Barrett MT, Galipeau PC, et al. : Genetic analysis of long-term Barrett’s esophagus epithelial cultures exhibiting cytogenetic and ploidy abnormalities._Gastroenterology._1998;114(2):295–304. 10.1016/S0016-5085(98)70480-9 [DOI] [PubMed] [Google Scholar]
  12. Perez-Llamas C, Lopez-Bigas N: Gitools: analysis and visualisation of genomic data using interactive heat-maps._PLoS One._2011;6(5):e19541. 10.1371/journal.pone.0019541 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Rockett JC, Larkin K, Darnton SJ, et al. : Five newly established oesophageal carcinoma cell lines: phenotypic and immunological characterization._Br J Cancer._1997;75(2):258–63. 10.1038/bjc.1997.42 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. The 1000 Genomes Project Consortium, . Abecasis GR, Auton A, et al. : An integrated map of genetic variation from 1,092 human genomes._Nature._2012;491(7422):56–65. 10.1038/nature11632 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Weaver JM, Ross-Innes CS, Shannon N, et al. : Ordering of mutations in preinvasive disease stages of esophageal carcinogenesis._Nat Genet._2014;46(8):837–43. 10.1038/ng.3013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Zerbino DR, Wilder SP, Johnson N, et al. : The ensembl regulatory build._Genome Biol._2015;16(1):56. 10.1186/s13059-015-0621-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

In this study Contino present their WGS analysis of 9 (verified) oesophageal adenocarcinoma cell lines. This is an adequate platform to present these data and the fact that the authors make all raw BAM files easily accessible to the community means that this study is particularly valuable to colleagues looking to contrast cell lines with particular genomic aberrations or different neo-antigenic burdens. Such studies always come with the known caveats of_in vitro_ selection and the authors rightfully acknowledge this. As expected, the study in large part confirms earlier large scale sequencing studies of primary material. The lack of a patient-specific reference control means that the impact of more subtle genomic abnormalities in for example regulatory regions remain difficult to study. Nonetheless this work represents a valuable addition to previously published datasets and the authors are to be commended for publishing this analysis. The paper is terse and I enjoyed reading this study.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.


The authors have performed whole genome sequencing of eight esophageal adenocarcinoma cell lines and one esophageal high grade dysplastia cell line to an average depth of 30x. The authors have made the BAM and VCF files available through the EBI repository and this will be an excellent resource for researchers working on this cancer. We feel the methods used are appropriate and most of the analyses described are informative. We do however have a few suggestions for the authors to address, these are listed below:

Dataset validation WGS section:

  1. Clarify the % of variants that fall in each sequence context, coding, intronic, regulatory, intergenic. We assume this should sum to 100%.
  2. In the next sentence there is a “(“instead of a ” ,” “in front of the 21% and 11% respectively”
  3. Table 1: ploidy state of CP-D, should this be hypotetrapoid?
  4. Paragraph 2 of this section: Change 4,8x103 to 4.8x103
  5. MuTect was used as variant caller in the Dulak paper and SomaticSniper was used in the Weaver paper. The authors should explain that they can’t use a somatic variant caller as these require a "normal" sample and also that application of a different caller for this cell line project may also make comparisons with the Dulak and Weaver papers less powerful.

Analysis of putative EAC driver genes:

  1. There isn’t an ARID1A UTR variant shown for any of the cell lines in Figure 2b yet the authors mention 1 of the 9 cell lines has such a variant in the text.
    On a related note we think the authors should consider the relevance of including UTR and synonymous changes in figure2b. We don’t think that these are considered in the Dulak and Weaver papers and are, as far as we understand, unlikely to be functional.
  2. Second sentence of the second paragraph needs clarifying. Presumably missense mutations were found in MET and EGFR? IH-EsoAd1 should be JH-EsoAd1 in the same sentence.
  3. Authors should make more of the fact that they have sequenced whole genomes whereas the COSMIC cell line project has only sequenced cell line exomes. The authors could perhaps highlight the useful extra data that is available from this sequencing effort, such as identification of mutations in putative regulatory regions and germline variants. Both classes of variants will be of interest to researchers working on understanding the genetics of oesophageal adenocarcinoma and wishing to identify appropriate cell models to work with.

We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.


The authors have examined the DNA sequences of 8 oesophageal adenocarcinoma cells lines and one high-grade dysplasia cell line, The authors should be congratulated for tackling this important unmet need in oesophageal cancer research and publishing these important findings in such an accessible manner. As the authors state, oesophageal adenocarcinoma seems to be one of the cancers carrying the most mutations, and although several cell lines, including those utilized in this study are commonly used for laboratory studies, there has never been a systemic study of the genetic abnormalities in these cells lines. The data in this study does fill that important gap, allowing comparisons between them and the cancer_in vivo_.

The methods are appropriate for the study and well-described and the abstract accurately represents the contents of the study. The results are appropriately and clearly presented. The conclusions appear to be sound based on the data presented and most importantly the paper provides the data to enable other researchers to build on these data and hopefully further refine laboratory models for oesophageal adenocarcinoma.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Supplementary Materials

Data Availability Statement

The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2016 Contino G et al.

BAM files are available at the European Nucleotide Archive (ENA, EMBL-EBI,www.ebi.ac.uk/ena, Study PRJEB14018). Accession numbers: CP-D ERS1158083; SK-GT-4 ERS1158082; OE33 ERS1158081; OACP4 C ERS1158080; OACM5.1 ERS1158079; JH-EsoAd1 ERS1158078; FLO-1 ERS1158077; ES051 ERS1158076; ES026 ERS1158075.