Discrepancies in cancer genomic sequencing highlight opportunities for driver mutation discovery - PubMed (original) (raw)

Discrepancies in cancer genomic sequencing highlight opportunities for driver mutation discovery

Andrew M Hudson et al. Cancer Res. 2014.

Abstract

Cancer genome sequencing is being used at an increasing rate to identify actionable driver mutations that can inform therapeutic intervention strategies. A comparison of two of the most prominent cancer genome sequencing databases from different institutes (Cancer Cell Line Encyclopedia and Catalogue of Somatic Mutations in Cancer) revealed marked discrepancies in the detection of missense mutations in identical cell lines (57.38% conformity). The main reason for this discrepancy is inadequate sequencing of GC-rich areas of the exome. We have therefore mapped over 400 regions of consistent inadequate sequencing (cold-spots) in known cancer-causing genes and kinases, in 368 of which neither institute finds mutations. We demonstrate, using a newly identified PAK4 mutation as proof of principle, that specific targeting and sequencing of these GC-rich cold-spot regions can lead to the identification of novel driver mutations in known tumor suppressors and oncogenes. We highlight that cross-referencing between genomic databases is required to comprehensively assess genomic alterations in commonly used cell lines and that there are still significant opportunities to identify novel drivers of tumorigenesis in poorly sequenced areas of the exome. Finally, we assess other reasons for the observed discrepancy, such as variations in dbSNP filtering and the acquisition/loss of mutations, to give explanations as to why there is a discrepancy in pharmacogenomic studies, given recent concerns with poor reproducibility of data.

©2014 American Association for Cancer Research.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Marked discrepancy is seen in mutation calling between CCLE and COSMIC. a) Overall percentage conformity of 46,409 mutations detected by COSMIC and/or CCLE. The intersection between datasets (mutations found by both institutes) accounted for 57.38%. Cosmic-only mutations comprised 32.71% of the dataset and CCLE only mutations 9.91%. b) The percentage agreement between mutations reported in the 568 cell lines sequenced by both institutes.

Figure 2

Figure 2

In the original 18 cell line comparison, mutations detected by COSMIC but not CCLE were categorised into: poor coverage with 5 or less reads (Panel a); good read coverage (over 20 reads) and mutation detected on reads but annotated as a dbSNP, neutral variant, outside coding region in all transcripts, or detected on less than 10% of reads, and removed (Panel b); and good coverage, no mutation (Panel c). (Panel d) reveals that the most common cause for mutations being missed by CCLE was poor read coverage (41%). Images of read coverage were taken using the Integrative Genomics Viewer.

Figure 3

Figure 3

The 20 largest cold-spots detected in cancer census or kinase genes transcripts (of those that were sequenced by both COSMIC and CCLE hybrid capture) using CCLE whole exome sequencing data. All but one of these cold-spots is located in a high GC-content area and results in no mutations being detected by either institute. The TET2 cold-spot is not located in high-GC content areas and contains mutations detected by COSMIC, indicating that this cold-spot was not present in the COSMIC data. The outer shaded grey plot shows the GC content at each base (calculated as 50bp either side) with GC content over 70% shaded in red. The middle light green plot shows sequencing read coverage with white troughs representing poor read coverage. The inner 3 rings record the position of mutations found by both institutes (orange), COSMIC only (violet) and CCLE (green). Light blue shards show cold-spots over 100bp in length with the top 20 shaded darker. Data were plotted using a combination of Circos and custom scripts.

Similar articles

Cited by

References

    1. Kim ES, Herbst RS, Wistuba II, Lee JJ, Blumenschein GR, Tsao A, et al. The BATTLE Trial: Personalizing Therapy for Lung Cancer. Cancer Discovery. 2011;1:44–53. - PMC - PubMed
    1. Fawdar S, Trotter EW, Li Y, Stephenson NL, Hanke F, Marusiak AA, et al. Targeted genetic dependency screen facilitates identification of actionable mutations in FGFR4, MAP3K9, and PAK5 in lung cancer. Proc Natl Acad Sci U S A. 2013;110:12426–31. - PMC - PubMed
    1. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–7. - PMC - PubMed
    1. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D945–50. - PMC - PubMed
    1. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446:153–8. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources