PLRS: a flexible tool for the joint analysis of DNA copy number and mRNA expression data (original) (raw)

Journal Article

,

1Department of Mathematics, VU University, De Boelelaan 1081a, 1081HV Amsterdam and 2Department of Epidemiology and Biostatistics, VU University Medical Center, 1007MB Amsterdam, The Netherlands

*To whom correspondence should be addressed.

Search for other works by this author on:

1Department of Mathematics, VU University, De Boelelaan 1081a, 1081HV Amsterdam and 2Department of Epidemiology and Biostatistics, VU University Medical Center, 1007MB Amsterdam, The Netherlands

Search for other works by this author on:

Received:

23 November 2012

Revision received:

11 January 2013

Accepted:

12 February 2013

Published:

17 February 2013

Navbar Search Filter Mobile Enter search term Search

Abstract

Summary: DNA copy number and mRNA expression are commonly used data types in cancer studies. Available software for integrative analysis arbitrarily fixes the parametric form of the association between the two molecular levels and hence offers no opportunities for modelling it. We present a new tool for flexible modelling of this association. PLRS uses a wide class of interpretable models including popular ones and incorporates prior biological knowledge. It is capable to identify the gene-specific type of relationship between gene copy number and mRNA expression. Moreover, it tests the strength of the association and provides confidence intervals. We illustrate PLRS using glioblastoma data from The Cancer Genome Atlas.

Availability and implementation: PLRS is implemented as an R package and available from Bioconductor (as of version 2.12; http://bioconductor.org). Additional code for parallel computations is available as Supplementary Material.

Contact: g.g.r.leday@vu.nl

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

DNA copy number aberrations are characteristics of the cancer cell. These aberrations are gains and losses of chromosomal DNA, which may alter expression levels of mRNA transcripts. The identification of genes for which an abnormal copy number affects gene expression is important in cancer studies, as these genes are likely to be relevant for tumourigenesis. Here, we present a new tool for exploratory and confirmatory analysis of such effects.

For a given gene, copy number and mRNA expression are generally believed to be concordant. The exact form of the association is usually not established. In fact, the shape is likely to differ between genes because of the presence of different (post-) transcriptional regulatory mechanisms. Tools that investigate the interaction between the two molecular levels assist in better understanding of regulatory mechanisms.

Numerous software packages have been proposed for joint analysis of copy number and gene expression data (Chari et al., 2008; Lê Cao et al., 2009; Lee and Kim, 2009; Louhimo and Hautaniemi, 2011; Salari et al., 2010; van Wieringen et al., 2006). However, most of these fix the association between DNA and RNA a priori, typically a linear or piecewise constant line. Hence, these approaches do not permit investigation or identification of the shape of the association. Recently, the need for more subtle models has been highlighted (Leday et al., 2013; Nemes et al., 2012; Solvang et al., 2011) to reflect the biological mechanisms between the two molecular levels. Here, we describe the R package PLRS that implements the framework recently proposed by Leday et al. (2013). PLRS uses piecewise linear regression splines, which allow multiple linear lines, and are a wide class of interpretable models including the linear and piecewise constant ones. It enforces concordance by restricting relevant model parameters. In addition, PLRS tests the strength of the overall association, identifies its functional shape and provides confidence intervals for the estimated curve. We illustrate PLRS using a dataset from 160 glioblastoma samples obtained from The Cancer Genome Atlas (TCGA).

2 MODEL

PLRS models _cis_-relationships between copy number and mRNA expression by piecewise linear regression splines (Leday et al., 2013). The relevance of this class of models is multifold. Unlike other methods, PLRS combines copy number data from various steps of the preprocessing, namely, the segmented and called data (van de Wiel et al., 2011). Segmented data are continuous (log2-values) and provide the (relative) amount of DNA copies (gene dosage), whereas called data represent discrete states associated with the various types of copy number aberration; the biological literature commonly distinguishes four of these: ‘loss’ (less than two copies of genomic DNA), ‘normal’ (two copies), ‘gain’ (three to four copies) and ‘amplification’ (more than four copies). Second, PLRS allows the effect of DNA on mRNA to differ across types of aberrations. This is biologically plausible: the efficacy of mechanisms that compensate for genomic aberrations may differ between losses, gains and amplifications. Third, good interpretability is ensured by the piecewise linearity of the model and a set of restrictions on the parameters. For example, copy number is concordant with gene expression and ‘normal’ copy number cannot severely alter gene expression.

In this context, the R package PLRS implements various statistical procedures to detect which and how gene copy number abnormalities alter the gene expression level. Identification of the functional form of the association is achieved by model selection, which automatically merges copy number states when their association with mRNA expression can be captured with one regression line. Simultaneous confidence intervals on the selected curve are provided for more detailed description. Finally, a statistical test evaluates the significance of the overall association by testing the null hypothesis: copy number does not affect mRNA expression, leading to a single horizontal line.

3 RESULTS

We applied PLRS to a dataset of 160 glioblastoma tumour samples obtained from TCGA (http://cancergenome.nih.gov/; Verhaak et al., 2010) for which copy number (Agilent CGH Microarray 244A) and mRNA expression (Agilent 244K platform) were available. We found that for many known cancer genes, the expression level is strongly associated with DNA aberrations (cf. Supplementary Material). Figure 1 depicts the DNA–mRNA association for four genes, including known cancer genes MET, ERCC2 and AGAP2. Clearly, relationships are different and demonstrate that the flexibility of the PLRS model allows new insights in the association. For gene MET, we observe that the effect of amplifications extends that of gains more than proportionally. For ERCC2, the expression level of samples with loss and normal copy number differs in average and expression increases linearly with dosage. Amplifications of gene AGAP2 have a strong effect on mRNA expression, whereas gains have none. The effect as defined by PLRS is broad and expressed by both an intercept and a slope for each copy number aberration state. The variety of models resulting from PLRS contrasts with most other methods, which impose a unique parametric form to all genes. Our method lets the data decide what is most appropriate. As a consequence, PLRS has more power than other standard methods for detecting relatively large effects occurring in small subgroups of samples (Leday et al., 2013). Note that other non-linear techniques, e.g. based on mutual information, can be competitive but less interpretable.

DNA–mRNA associations for four genes in the TCGA dataset. X-axis: Gene dosage (segmented values), y-axis: mRNA gene expression. Copy number states are indicated by symbols: loss (open inverted triangles), normal (open circles), gain (open triangles) and amplification (multiplication symbols). Grey surfaces correspond to 95% uniform confidence bands. The top left value corresponds to the P-value of the PLRS test

Fig. 1.

DNA–mRNA associations for four genes in the TCGA dataset. _X_-axis: Gene dosage (segmented values), _y_-axis: mRNA gene expression. Copy number states are indicated by symbols: loss (open inverted triangles), normal (open circles), gain (open triangles) and amplification (multiplication symbols). Grey surfaces correspond to 95% uniform confidence bands. The top left value corresponds to the _P_-value of the PLRS test

4 CONCLUSION

PLRS is a tool for flexible modelling of the association between DNA copy number and mRNA expression. We demonstrated its potential to reveal interesting relationships. It is particularly useful for (i) a detailed understanding of the relationship between DNA copy number and mRNA expression and (ii) powerful detection of copy number-induced sample subgroup-specific effects, thereby acknowledging heterogeneity of many cancers. The software can also be used for studying the effect of DNA copy number on microRNA expression.

Funding: The Center for Medical Systems Biology (CMSB), established by the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research (NGI/NWO).

Conflict of Interest: none declared.

REFERENCES

et al.

SIGMA2: a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes

,

BMC Bioinformatics

,

2008

, vol.

9

pg.

422

et al.

integrOmics: an R package to unravel relationships between two omics datasets

,

Bioinformatics

,

2009

, vol.

25

(pg.

2855

-

2856

)

et al.

Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines

,

Ann. Appl. Stat

,

2013

doi:10.1214/12-AOAS605

CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome

,

BMC Bioinformatics

,

2009

, vol.

10

pg.

424

CNAmet: an R package for integrating copy number, methylation and expression data

,

Bioinformatics

,

2011

, vol.

27

(pg.

887

-

888

)

et al.

Segmented regression, a versatile tool to analyze mRNA levels in relation to DNA copy number aberrations

,

Gene Chromosome Cancer

,

2012

, vol.

51

(pg.

77

-

82

)

et al.

DR-integrator: a new analytic tool for integrating DNA copy number and gene expression data

,

Bioinformatics

,

2010

, vol.

26

(pg.

414

-

416

)

et al.

Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer

,

BMC Bioinformatics

,

2011

, vol.

12

pg.

197

et al.

Preprocessing and downstream analysis of microarray DNA copy number profiles

,

Brief Bioinform.

,

2011

, vol.

12

(pg.

10

-

21

)

et al.

ACE-it: a tool for genome-wide integration of gene dosage and RNA expression data

,

Bioinformatics

,

2006

, vol.

22

(pg.

1919

-

1920

)

et al.

Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1

,

Cancer cell

,

2010

, vol.

17

(pg.

98

-

110

)

Author notes

Associate Editor: Martin Bishop

© The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 1,053

702 Pageviews

351 PDF Downloads

Since 2/1/2017

Month: Total Views:
February 2017 8
March 2017 7
April 2017 3
May 2017 6
June 2017 2
July 2017 4
August 2017 9
September 2017 2
October 2017 6
November 2017 11
December 2017 14
January 2018 19
February 2018 14
March 2018 26
April 2018 17
May 2018 12
June 2018 24
July 2018 13
August 2018 28
September 2018 31
October 2018 2
November 2018 23
December 2018 11
January 2019 11
February 2019 10
March 2019 17
April 2019 27
May 2019 16
June 2019 14
July 2019 14
August 2019 6
September 2019 9
October 2019 14
November 2019 5
December 2019 23
January 2020 10
February 2020 14
March 2020 18
April 2020 10
May 2020 7
June 2020 8
July 2020 9
August 2020 11
September 2020 3
October 2020 16
November 2020 16
December 2020 12
January 2021 8
February 2021 5
March 2021 14
April 2021 12
May 2021 5
June 2021 6
July 2021 19
August 2021 11
September 2021 11
October 2021 14
November 2021 7
December 2021 8
January 2022 13
February 2022 12
March 2022 12
April 2022 13
May 2022 11
June 2022 6
July 2022 15
August 2022 23
September 2022 34
October 2022 13
November 2022 10
December 2022 15
January 2023 6
February 2023 8
March 2023 4
April 2023 2
May 2023 6
June 2023 3
July 2023 5
August 2023 9
September 2023 7
October 2023 2
November 2023 5
December 2023 9
January 2024 8
February 2024 7
March 2024 6
April 2024 10
May 2024 22
June 2024 5
July 2024 11
August 2024 7
September 2024 9
October 2024 7
November 2024 6

Citations

6 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic