PLRS: a flexible tool for the joint analysis of DNA copy number and mRNA expression data (original) (raw)
Journal Article
,
1Department of Mathematics, VU University, De Boelelaan 1081a, 1081HV Amsterdam and 2Department of Epidemiology and Biostatistics, VU University Medical Center, 1007MB Amsterdam, The Netherlands
*To whom correspondence should be addressed.
Search for other works by this author on:
1Department of Mathematics, VU University, De Boelelaan 1081a, 1081HV Amsterdam and 2Department of Epidemiology and Biostatistics, VU University Medical Center, 1007MB Amsterdam, The Netherlands
Search for other works by this author on:
Received:
23 November 2012
Revision received:
11 January 2013
Accepted:
12 February 2013
Published:
17 February 2013
Navbar Search Filter Mobile Enter search term Search
Abstract
Summary: DNA copy number and mRNA expression are commonly used data types in cancer studies. Available software for integrative analysis arbitrarily fixes the parametric form of the association between the two molecular levels and hence offers no opportunities for modelling it. We present a new tool for flexible modelling of this association. PLRS uses a wide class of interpretable models including popular ones and incorporates prior biological knowledge. It is capable to identify the gene-specific type of relationship between gene copy number and mRNA expression. Moreover, it tests the strength of the association and provides confidence intervals. We illustrate PLRS using glioblastoma data from The Cancer Genome Atlas.
Availability and implementation: PLRS is implemented as an R package and available from Bioconductor (as of version 2.12; http://bioconductor.org). Additional code for parallel computations is available as Supplementary Material.
Contact: g.g.r.leday@vu.nl
Supplementary information: Supplementary data are available at Bioinformatics online.
1 INTRODUCTION
DNA copy number aberrations are characteristics of the cancer cell. These aberrations are gains and losses of chromosomal DNA, which may alter expression levels of mRNA transcripts. The identification of genes for which an abnormal copy number affects gene expression is important in cancer studies, as these genes are likely to be relevant for tumourigenesis. Here, we present a new tool for exploratory and confirmatory analysis of such effects.
For a given gene, copy number and mRNA expression are generally believed to be concordant. The exact form of the association is usually not established. In fact, the shape is likely to differ between genes because of the presence of different (post-) transcriptional regulatory mechanisms. Tools that investigate the interaction between the two molecular levels assist in better understanding of regulatory mechanisms.
Numerous software packages have been proposed for joint analysis of copy number and gene expression data (Chari et al., 2008; Lê Cao et al., 2009; Lee and Kim, 2009; Louhimo and Hautaniemi, 2011; Salari et al., 2010; van Wieringen et al., 2006). However, most of these fix the association between DNA and RNA a priori, typically a linear or piecewise constant line. Hence, these approaches do not permit investigation or identification of the shape of the association. Recently, the need for more subtle models has been highlighted (Leday et al., 2013; Nemes et al., 2012; Solvang et al., 2011) to reflect the biological mechanisms between the two molecular levels. Here, we describe the R package PLRS that implements the framework recently proposed by Leday et al. (2013). PLRS uses piecewise linear regression splines, which allow multiple linear lines, and are a wide class of interpretable models including the linear and piecewise constant ones. It enforces concordance by restricting relevant model parameters. In addition, PLRS tests the strength of the overall association, identifies its functional shape and provides confidence intervals for the estimated curve. We illustrate PLRS using a dataset from 160 glioblastoma samples obtained from The Cancer Genome Atlas (TCGA).
2 MODEL
PLRS models _cis_-relationships between copy number and mRNA expression by piecewise linear regression splines (Leday et al., 2013). The relevance of this class of models is multifold. Unlike other methods, PLRS combines copy number data from various steps of the preprocessing, namely, the segmented and called data (van de Wiel et al., 2011). Segmented data are continuous (log2-values) and provide the (relative) amount of DNA copies (gene dosage), whereas called data represent discrete states associated with the various types of copy number aberration; the biological literature commonly distinguishes four of these: ‘loss’ (less than two copies of genomic DNA), ‘normal’ (two copies), ‘gain’ (three to four copies) and ‘amplification’ (more than four copies). Second, PLRS allows the effect of DNA on mRNA to differ across types of aberrations. This is biologically plausible: the efficacy of mechanisms that compensate for genomic aberrations may differ between losses, gains and amplifications. Third, good interpretability is ensured by the piecewise linearity of the model and a set of restrictions on the parameters. For example, copy number is concordant with gene expression and ‘normal’ copy number cannot severely alter gene expression.
In this context, the R package PLRS implements various statistical procedures to detect which and how gene copy number abnormalities alter the gene expression level. Identification of the functional form of the association is achieved by model selection, which automatically merges copy number states when their association with mRNA expression can be captured with one regression line. Simultaneous confidence intervals on the selected curve are provided for more detailed description. Finally, a statistical test evaluates the significance of the overall association by testing the null hypothesis: copy number does not affect mRNA expression, leading to a single horizontal line.
3 RESULTS
We applied PLRS to a dataset of 160 glioblastoma tumour samples obtained from TCGA (http://cancergenome.nih.gov/; Verhaak et al., 2010) for which copy number (Agilent CGH Microarray 244A) and mRNA expression (Agilent 244K platform) were available. We found that for many known cancer genes, the expression level is strongly associated with DNA aberrations (cf. Supplementary Material). Figure 1 depicts the DNA–mRNA association for four genes, including known cancer genes MET, ERCC2 and AGAP2. Clearly, relationships are different and demonstrate that the flexibility of the PLRS model allows new insights in the association. For gene MET, we observe that the effect of amplifications extends that of gains more than proportionally. For ERCC2, the expression level of samples with loss and normal copy number differs in average and expression increases linearly with dosage. Amplifications of gene AGAP2 have a strong effect on mRNA expression, whereas gains have none. The effect as defined by PLRS is broad and expressed by both an intercept and a slope for each copy number aberration state. The variety of models resulting from PLRS contrasts with most other methods, which impose a unique parametric form to all genes. Our method lets the data decide what is most appropriate. As a consequence, PLRS has more power than other standard methods for detecting relatively large effects occurring in small subgroups of samples (Leday et al., 2013). Note that other non-linear techniques, e.g. based on mutual information, can be competitive but less interpretable.
Fig. 1.
DNA–mRNA associations for four genes in the TCGA dataset. _X_-axis: Gene dosage (segmented values), _y_-axis: mRNA gene expression. Copy number states are indicated by symbols: loss (open inverted triangles), normal (open circles), gain (open triangles) and amplification (multiplication symbols). Grey surfaces correspond to 95% uniform confidence bands. The top left value corresponds to the _P_-value of the PLRS test
4 CONCLUSION
PLRS is a tool for flexible modelling of the association between DNA copy number and mRNA expression. We demonstrated its potential to reveal interesting relationships. It is particularly useful for (i) a detailed understanding of the relationship between DNA copy number and mRNA expression and (ii) powerful detection of copy number-induced sample subgroup-specific effects, thereby acknowledging heterogeneity of many cancers. The software can also be used for studying the effect of DNA copy number on microRNA expression.
Funding: The Center for Medical Systems Biology (CMSB), established by the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research (NGI/NWO).
Conflict of Interest: none declared.
REFERENCES
et al.
SIGMA2: a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes
,
BMC Bioinformatics
,
2008
, vol.
9
pg.
422
et al.
integrOmics: an R package to unravel relationships between two omics datasets
,
Bioinformatics
,
2009
, vol.
25
(pg.
2855
-
2856
)
et al.
Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines
,
Ann. Appl. Stat
,
2013
doi:10.1214/12-AOAS605
CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome
,
BMC Bioinformatics
,
2009
, vol.
10
pg.
424
CNAmet: an R package for integrating copy number, methylation and expression data
,
Bioinformatics
,
2011
, vol.
27
(pg.
887
-
888
)
et al.
Segmented regression, a versatile tool to analyze mRNA levels in relation to DNA copy number aberrations
,
Gene Chromosome Cancer
,
2012
, vol.
51
(pg.
77
-
82
)
et al.
DR-integrator: a new analytic tool for integrating DNA copy number and gene expression data
,
Bioinformatics
,
2010
, vol.
26
(pg.
414
-
416
)
et al.
Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer
,
BMC Bioinformatics
,
2011
, vol.
12
pg.
197
et al.
Preprocessing and downstream analysis of microarray DNA copy number profiles
,
Brief Bioinform.
,
2011
, vol.
12
(pg.
10
-
21
)
et al.
ACE-it: a tool for genome-wide integration of gene dosage and RNA expression data
,
Bioinformatics
,
2006
, vol.
22
(pg.
1919
-
1920
)
et al.
Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1
,
Cancer cell
,
2010
, vol.
17
(pg.
98
-
110
)
Author notes
Associate Editor: Martin Bishop
© The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Supplementary data
Citations
Views
Altmetric
Metrics
Total Views 1,053
702 Pageviews
351 PDF Downloads
Since 2/1/2017
Month: | Total Views: |
---|---|
February 2017 | 8 |
March 2017 | 7 |
April 2017 | 3 |
May 2017 | 6 |
June 2017 | 2 |
July 2017 | 4 |
August 2017 | 9 |
September 2017 | 2 |
October 2017 | 6 |
November 2017 | 11 |
December 2017 | 14 |
January 2018 | 19 |
February 2018 | 14 |
March 2018 | 26 |
April 2018 | 17 |
May 2018 | 12 |
June 2018 | 24 |
July 2018 | 13 |
August 2018 | 28 |
September 2018 | 31 |
October 2018 | 2 |
November 2018 | 23 |
December 2018 | 11 |
January 2019 | 11 |
February 2019 | 10 |
March 2019 | 17 |
April 2019 | 27 |
May 2019 | 16 |
June 2019 | 14 |
July 2019 | 14 |
August 2019 | 6 |
September 2019 | 9 |
October 2019 | 14 |
November 2019 | 5 |
December 2019 | 23 |
January 2020 | 10 |
February 2020 | 14 |
March 2020 | 18 |
April 2020 | 10 |
May 2020 | 7 |
June 2020 | 8 |
July 2020 | 9 |
August 2020 | 11 |
September 2020 | 3 |
October 2020 | 16 |
November 2020 | 16 |
December 2020 | 12 |
January 2021 | 8 |
February 2021 | 5 |
March 2021 | 14 |
April 2021 | 12 |
May 2021 | 5 |
June 2021 | 6 |
July 2021 | 19 |
August 2021 | 11 |
September 2021 | 11 |
October 2021 | 14 |
November 2021 | 7 |
December 2021 | 8 |
January 2022 | 13 |
February 2022 | 12 |
March 2022 | 12 |
April 2022 | 13 |
May 2022 | 11 |
June 2022 | 6 |
July 2022 | 15 |
August 2022 | 23 |
September 2022 | 34 |
October 2022 | 13 |
November 2022 | 10 |
December 2022 | 15 |
January 2023 | 6 |
February 2023 | 8 |
March 2023 | 4 |
April 2023 | 2 |
May 2023 | 6 |
June 2023 | 3 |
July 2023 | 5 |
August 2023 | 9 |
September 2023 | 7 |
October 2023 | 2 |
November 2023 | 5 |
December 2023 | 9 |
January 2024 | 8 |
February 2024 | 7 |
March 2024 | 6 |
April 2024 | 10 |
May 2024 | 22 |
June 2024 | 5 |
July 2024 | 11 |
August 2024 | 7 |
September 2024 | 9 |
October 2024 | 7 |
November 2024 | 6 |
Citations
6 Web of Science
×
Email alerts
Citing articles via
More from Oxford Academic