Migrating the SNP array-based homologous recombination deficiency measures to next generation sequencing data of breast cancer (original) (raw)
Abstract
The first genomic scar-based homologous recombination deficiency (HRD) measures were produced using SNP arrays. As array-based technology has been largely replaced by next generation sequencing approaches, it has become important to develop algorithms that derive the same type of genomic scar scores from next generation sequencing (whole exome “WXS”, whole genome “WGS”) data. In order to perform this analysis, we introduce here the scarHRD R package and show that using this method the SNP array-based and next generation sequencing-based derivation of HRD scores show good correlation (Pearson correlation between 0.73 and 0.87 depending on the actual HRD measure) and that the NGS-based HRD scores distinguish similarly well between BRCA mutant and BRCA wild-type cases in a cohort of triple-negative breast cancer patients of the TCGA data set.
Introduction
Reliable quantification of homologous recombination deficiency of human tumor biopsies, especially in the case of ovarian and breast cancer, is expected to identify patients that are particularly sensitive to platinum or PARP inhibitor-based therapy.1 Before the widespread introduction of next generation sequencing (NGS) to characterize tumor biopsies, SNP arrays were used to identify large-scale genomic aberrations associated with homologous recombination deficiency, often induced by the loss of BRCA1 or BRCA2 function. Three such measures were identified: telomeric allelic imbalance (HRD-TAI score),2 loss of heterozygosity profiles (HRD-LOH score),3 and large-scale state transitions (HRD-LST score).4 These three measures have also been combined into a single summary measure of HR deficiency.5 The HRD-LOH score has also become an integral part of a recently published, whole-genome sequencing-based measure of homologous recombination deficiency, HRDetect.6 These measures, along with functional assays,7 showed promise to identify HR-deficient cases and thus predict response to platinum or PARP inhibitor therapy.2,8,9 Since NGS has become the main genomic characterization method of cancer biopsies, it has become essential to migrate the SNP array-based methodology to NGS-based platforms.
TCGA breast cancer biopsies have been both SNP array profiled and subjected to NGS allowing a direct comparison.8
Results and discussion
We found good correlation between the SNP array-based and NGS-based HRD scores (Fig. 1). When comparing the results of the scarHRD R package to SNP array-based measurements, we found the following Pearson correlation coefficients: number of telomeric allelic imbalances (NtAI): r = 0.84 (_R_2 = 0.70, adjusted _R_2 = 0.70, p < 2.2e–16), large-scale transition (LST) r = 0.79 (_R_2 = 0.62, adjusted _R_2 = 0.62, p < 2.2e–16) loss of heterozygosity (HRD−LOH) r = 0.73 (_R_2 = 0.53, adjusted _R_2 = 0.52, p < 2.2e–16). These three measures are often combined for diagnostic purposes5 and in HRDetect.6 Therefore, we also compared the sum of the three scores across the two platforms (HRD sum): r = 0.87 (_R_2 = 0.75, adjusted _R_2 = 0.75, p < 2.2e–16) (Fig. 1). The artificial reduction of coverage to 30× did not affect this correlation (Supplementary Material, Figure S7-S8). The BRCA1/2-mutated samples showed significantly higher NGS-based HRD-sum values (Fig. 2, Supplementary Figure S6). The predictive value of HRD-sum, measured as AUC value of the corresponding ROC curve, was 80.8% (Supplementary Figure S2).
Fig. 1.
Correlation between Affymetrix SNP 6.0 array-based and whole exome sequencing-based measurements of homologous recombination deficiency (telomeric allelic imbalance, loss of heterozygosity, large-scale transitions, and the sum of these estimates)
Fig. 2.
Distribution of HRD-sum values in BRCA1/2 deficient and in BRCA1/2 intact triple-negative breast cancer samples from TCGA. HRD-sum values were determined with the scarHRD R package
There was no significant difference in SNP versus WXS-based estimation of tAI, LST, and HRD-sum, but the number of LOH events were significantly lower in the WXS-based estimation (p = 0.012, Kolmogorov–Smirnov test). This could be attributed to differences in segmentation algorithm (the more segmented the WXS data is the lower number of LOHs that are called) or to low sample quality, coverage. However, when comparing the ROC curves for BRCA1/2 status of the SNP-based and WXS-based HRD-score, there was no significant difference between the SNP array-based and NGS-based methods. (Supplementary Figure S3).
According to our expectations and previous results the BRCA1/2-deficient cases showed higher values for each of the four scores (Supplementary Figure S4-S5).
The sum of the three HRD scores showed good correlation across the two platforms. Thus in more advanced NGS-based HR deficiency measures such as HRDetect, the SNP array-based step could be replaced by an NGS-based estimate of the HR deficiency scores.
Brief description of the methods
Based on receptor status determined by immunohistochemistry, 139 paired tumor and normal samples of the TCGA breast cancer cohort could be classified as triple-negative breast cancer. From these patients 95 had Affymetrix SNP 6.0 array-based HRD estimates (LOH, TAI, LST), previously published by our group.10 In this publication we present the scarHRD R package (https://github.com/sztup/scarHRD) which estimates the level of the three HR deficiency measures using NGS data.
A sample’s LOH score is the total number of LOH regions across the entire genome that are larger than 15 Mb but do not cover whole chromosomes. In the original publication this 15 Mb lower limit for LOH was determined by comparing SNP array profiles between BRCA mutant and BRCA wild-type cases.3 We performed a similar analysis using NGS data and found that the original 15 Mb cutoff performed best in this case as well (Supplementary Figure S1).
The LST is defined as a chromosomal break between adjacent regions of at least 10 Mb, with a distance between them not larger than 3 Mb.
The number of telomeric allelic imbalances is the number of AIs (the unequal contribution of parental allele sequences with or without changes in the overall copy number of the region) that extend to the telomeric end of a chromosome.
Allele-specific copy number estimation is a crucial part of estimating HR deficiency. As previously shown, allele-specific copy number estimation from NGS data performed using the Sequenza R package show high agreement with SNP array-based copy number profiles.11 The scarHRD package is, therefore, able to use Sequenza preprocessed files as well as other allele-specific segmentation files in the same format.
As it has been previously shown that in ovarian cancer the sum of the genomic scar scores is elevated in BRCA-deficient cancers,5 an additional aim of our study was to compare the unweighted numeric sum of LOH, tAI, and LST, called here HRD-sum, to the BRCA1/2 status of the patients. A sample was classified as BRCA-deficient if (1) there was a deep deletion of BRCA1/2, (2) a germline and a somatic mutation in BRCA1/2 with LOH, or (3) if LOH had co-occurred with promoter methylation in one of the BRCA1/2 genes. The somatic mutation status (mutations with likely pathogenic function) and methylation data was acquired from the TCGA data portal. The germline mutation status was determined using HaplotypeCaller, and was annotated with Intervar,12 likely pathogenic mutations and frameshift insertion/deletion with unknown significance were used in our analysis. LOH was determined using Sequenza’s allele-specific segmentation results (Supplementary Table S1).
Data availability
The data sets generated during the current study are available from the corresponding author on reasonable request.
Code availability
The code/algorithm for performing the experiments is available for download at https://github.com/sztup/scarHRD.
Electronic supplementary material
Acknowledgements
This work was supported by the Research and Technology Innovation Fund (KTIA_NAP_13-2014-0021 to Z.S.); Breast Cancer Research Foundation and the Novo Nordisk Foundation Interdisciplinary Synergy Programme Grant (NNF15OC0016584 to I.C. and Z.S.), by the ÚNKP-17-4-III-SE-63 New National Excellence Program of the Ministry of Human Capacities to L.R. and by Tesaro Inc. The results shown here are based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/.
Author contributions
Conception, design, writing, and review of the manuscript: Zs.S., M.D, M.K, L.R., I.C., F.F., N.J.B, A.C.E., A.S., and Zo.S. Development of methodology: Zs.S, M.D, M.K. Analysis and interpretation of data: Zs.S., M.D, M.K, Zo.S.
Competing interests
N.J.B., A.C.E., and Zo.S are listed as co-inventors on a patent on telomeric allelic imbalance, which is owned by Children’s Hospital Boston and licensed to Myriad Genetics. The remaining authors declare no competing interests.
Footnotes
Electronic supplementary material
Supplementary information accompanies the paper on the npj Breast Cancer website (10.1038/s41523-018-0066-6).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Lord CJ, Ashworth A. BRCAness revisited. Nat. Rev. Cancer. 2016;16:110–120. doi: 10.1038/nrc.2015.21. [DOI] [PubMed] [Google Scholar]
- 2.Birkbak NJ, et al. Telomeric allelic imbalance indicates defective DNA repair and sensitivity to DNA-damaging agents. Cancer Discov. 2012;2:366–375. doi: 10.1158/2159-8290.CD-11-0206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Abkevich V, et al. Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. Br. J. Cancer. 2012;107:1776–1782. doi: 10.1038/bjc.2012.451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Popova T, et al. Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. Cancer Res. 2012;72:5454–5462. doi: 10.1158/0008-5472.CAN-12-1470. [DOI] [PubMed] [Google Scholar]
- 5.Telli ML, et al. Homologous recombination deficiency (HRD) score predicts response to platinum-containing neoadjuvant chemotherapy in patients with triple-negative breast cancer. Clin. Cancer Res. 2016;22:3764–3773. doi: 10.1158/1078-0432.CCR-15-2477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Davies H, et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 2017;23:517–525. doi: 10.1038/nm.4292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mutter RW, et al. Bi-allelic alterations in DNA repair genes underpin homologous recombination DNA repair defects in breast cancer. J. Pathol. 2017;242:165–177. doi: 10.1002/path.4890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mirza MR, et al. Niraparib maintenance therapy in platinum-sensitive, recurrent ovarian cancer. N. Engl. J. Med. 2016;375:2154–2164. doi: 10.1056/NEJMoa1611310. [DOI] [PubMed] [Google Scholar]
- 9.Zhao EY, et al. Homologous recombination deficiency and platinum-based therapy outcomes in advanced breast cancer. Clin. Cancer Res. 2017;23:7521–7530. doi: 10.1158/1078-0432.CCR-17-1941. [DOI] [PubMed] [Google Scholar]
- 10.Marquard AM, et al. Pan-cancer analysis of genomic scar signatures associated with homologous recombination deficiency suggests novel indications for existing cancer drugs. Biomark. Res. 2015;3:9. doi: 10.1186/s40364-015-0033-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Favero F, et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 2015;26:64–70. doi: 10.1093/annonc/mdu479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Li Q, Wang K. InterVar: clinical interpretation of genetic variants by the 2015 ACMG-AMP guidelines. Am. J. Hum. Genet. 2017;100:267–280. doi: 10.1016/j.ajhg.2017.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data sets generated during the current study are available from the corresponding author on reasonable request.