Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data (original) (raw)

Journal Article

,

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

* To whom correspondence should be addressed.

Search for other works by this author on:

,

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

Search for other works by this author on:

,

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

Search for other works by this author on:

,

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

Search for other works by this author on:

,

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

Search for other works by this author on:

,

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

Search for other works by this author on:

,

Isabelle Janoueix-Lerosey

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

Search for other works by this author on:

,

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

Search for other works by this author on:

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

1Institut Curie, 2INSERM, U900, Bioinformatics and Computational Systems Biology of Cancer, Paris 75248, 3Mines ParisTech, Fontainebleau 77300, 4INSERM, U830, Genetics and Biology of Cancers, Paris 75248 and 5INRIA Saclay, Orsay 91893, France

Search for other works by this author on:

Received:

09 September 2011

Revision received:

03 November 2011

Accepted:

29 November 2011

Published:

06 December 2011

Cite

Valentina Boeva, Tatiana Popova, Kevin Bleakley, Pierre Chiche, Julie Cappo, Gudrun Schleiermacher, Isabelle Janoueix-Lerosey, Olivier Delattre, Emmanuel Barillot, Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data, Bioinformatics, Volume 28, Issue 3, February 2012, Pages 423–425, https://doi.org/10.1093/bioinformatics/btr670
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Summary: More and more cancer studies use next-generation sequencing (NGS) data to detect various types of genomic variation. However, even when researchers have such data at hand, single-nucleotide polymorphism arrays have been considered necessary to assess copy number alterations and especially loss of heterozygosity (LOH). Here, we present the tool Control-FREEC that enables automatic calculation of copy number and allelic content profiles from NGS data, and consequently predicts regions of genomic alteration such as gains, losses and LOH. Taking as input aligned reads, Control-FREEC constructs copy number and B-allele frequency profiles. The profiles are then normalized, segmented and analyzed in order to assign genotype status (copy number and allelic content) to each genomic region. When a matched normal sample is provided, Control-FREEC discriminates somatic from germline events. Control-FREEC is able to analyze overdiploid tumor samples and samples contaminated by normal cells. Low mappability regions can be excluded from the analysis using provided mappability tracks.

Availability: C++ source code is available at: http://bioinfo.curie.fr/projects/freec/

Contact: freec@curie.fr

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

Cancer genomes often display copy number alterations (CNAs) and/or losses of heterozygosity (LOH) (Hanahan and Weinberg, 2011). Genetic abnormalities in specific regions may be related to the aggressiveness of a cancer and be associated with clinical outcomes (Caren et al., 2010; Suzuki et al., 2000).

To detect CNA and LOH regions, single-nucleotide polymorphism (SNP) arrays have been recently much in use (Popova et al., 2009). Furthermore, next-generation sequencing (NGS) has been moving to replace SNP-arrays in prediction of CNAs (Boeva et al., 2010). A recent study presented ExomeCNV, a tool to predict CNAs and LOH using exome sequencing data (Sathirapongsasuti et al., 2011). However, detection of LOH regions and, more generally, prediction of genotype status (copy number and allelic content) of an altered region using whole-genome sequencing data has remained unsolved. The main challenges to doing so are non-uniform read coverage of genomic positions [for example, due to different mappability and GC-content (Boeva et al., 2010)] and alignment bias (reference allele coverage is usually higher than the coverage of the alternative allele). Thus, the resulting signal is noisier and more difficult to process than in the case of SNP arrays.

Here, we present Control-FREEC (Control-FREE Copy number and allelic content caller)—a tool that annotates genotypes and discovers CNAs and LOH. Control-FREEC inherits many features from FREEC (Boeva et al., 2010) (assessment of copy number variation and evaluation of contamination by normal cells) as well as the general methodology of the GAP algorithm for SNP arrays (Popova et al., 2009). Control-FREEC takes as an input aligned reads, then constructs and normalizes the copy number profile, constructs the B-allele frequency (BAF) profile, segments both profiles, ascribes the genotype status to each segment using both copy number and allelic frequency information, then annotates genomic alterations. If a control (matched normal) sample is available, Control-FREEC discerns somatic variants from germline ones.

2 METHODS

Workflow: the workflow of Control-FREEC consists of three steps: (i) calculation and segmentation of copy number profiles; (ii) calculation and segmentation of smoothed BAF profiles; (iii) prediction of final genotype status, i.e. copy number and allelic content for each segment (for example, A, AB, AAB, etc.).

First, we combine breakpoints issued from both copy number and median BAF segmentations to get genomic segments with presumably one status. Second, copy number status of each segment is detected as described previously (Boeva et al., 2010). If the CNA is present in most of the cells, there is no ambiguity in determining exact copy number of the region (see Supplementary Materials for more details on the strategy in the case of presence of subclones or normal contamination). Third, given the copy number of the region, we fit Gaussian mixture models (GMMs) with fixed means to the observed BAF values and select the model that provides the highest log-likelihood. For example, for a region with a copy number of two, we fit a two component model (mixture of ‘AA’ and ‘BB’ alleles) and a three component model (‘AA’, ‘AB’ and ‘BB’, with a condition on the minimal weight of ‘AB’). The component means in the GMM depend on the level of contamination by normal DNA (Supplementary Materials).

Input and output: the input consists of a SAM pileup (http://samtools.sourceforge.net/pileup.shtml) and a dbSNP file. The control dataset is optional if a reference genome is provided. The output contains a list of CNAs and LOH regions as well as read count, copy number, BAF and genotype information for each window. If a control (matched normal) dataset is available, each event is annotated as somatic or germline.

3 RESULTS

We applied Control-FREEC to detect CNAs and LOH regions in a tumor/normal dataset for a neuroblastoma patient (~ 30x-coverage, unpublished data). Control-FREEC detected somatic CNA and LOH regions covering 75% of the tumor genome (Fig. 1) and was able to identify the genotype status despite contamination of the tumor sample by normal cells (estimated percent of tumor cells was 60%).

Control-FREEC calculates copy number and BAF profiles and detects regions of copy number gain/loss and LOH regions. Tumor chromosomes 17 and 19 (bottom panels) versus ‘normal’ chromosomes (top panels; unpublished data). Predicted BAF and copy number profiles are shown in black. Gains, losses (left panels) and LOH (right panels) are shown in red, blue and light blue, respectively.

Fig. 1.

Control-FREEC calculates copy number and BAF profiles and detects regions of copy number gain/loss and LOH regions. Tumor chromosomes 17 and 19 (bottom panels) versus ‘normal’ chromosomes (top panels; unpublished data). Predicted BAF and copy number profiles are shown in black. Gains, losses (left panels) and LOH (right panels) are shown in red, blue and light blue, respectively.

Our results agreed with the SNP-array analysis output. We obtained 95.4% consistency between the results of Control-FREEC and GAP (Popova et al., 2009), which we applied to SNP array data generated for the same tumor sample (Supplementary Materials).

4 CONCLUSION

Control-FREEC is a tool for automatic detection of CNAs and LOH regions using NGS data. It accurately calls genotype status even when no control experiment is available and/or the genome is polyploid. It corrects for GC-content and mappability biases. In the case of tumor samples, Control-FREEC is able to evaluate the level of contamination by normal cells. The software is written in C++ and freely available.

Funding: ‘Projet Incitatif et Collaboratif Bioinformatique et Biostatistiques’ of the Institut Curie; Ligue Nationale Contre le Cancer.

Conflict of Interest: none declared.

REFERENCES

et al.

Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization

,

Bioinformatics

,

2011

, vol.

27

(pg.

268

-

269

)

et al.

High-risk neuroblastoma tumors with 11q-deletion display a poor prognostic, chromosome instability phenotype with later onset

,

Proc. Natl Acad. Sci. USA

,

2010

, vol.

107

(pg.

4323

-

4328

)

Hallmarks of cancer: the next generation

,

Cell

,

2011

, vol.

144

(pg.

646

-

674

)

Catching change-points with lasso

,

Adv. Neural Inform. Process. Syst.

,

2008

, vol.

22

(pg.

617

-

624

)

et al.

Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays

,

Genome Biol.

,

2009

, vol.

10

pg.

R128

et al.

Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV

,

Bioinformatics

,

2011

, vol.

27

(pg.

2648

-

2654

)

et al.

dbSNP: the NCBI database of genetic variation

,

Nucleic Acids Res.

,

2001

, vol.

29

(pg.

308

-

311

)

et al.

An approach to analysis of large-scale correlations between genome changes and clinical endpoints in ovarian cancer

,

Cancer Res

,

2000

, vol.

60

(pg.

5382

-

5385

)

Author notes

Associate Editor: Alex Bateman

© The Author(s) 2011. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 16,386

13,011 Pageviews

3,375 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 16
December 2016 19
January 2017 68
February 2017 91
March 2017 105
April 2017 73
May 2017 88
June 2017 75
July 2017 91
August 2017 91
September 2017 94
October 2017 105
November 2017 62
December 2017 182
January 2018 217
February 2018 195
March 2018 313
April 2018 290
May 2018 300
June 2018 350
July 2018 341
August 2018 248
September 2018 261
October 2018 214
November 2018 229
December 2018 202
January 2019 222
February 2019 202
March 2019 201
April 2019 226
May 2019 193
June 2019 221
July 2019 274
August 2019 265
September 2019 255
October 2019 196
November 2019 172
December 2019 142
January 2020 146
February 2020 145
March 2020 119
April 2020 84
May 2020 103
June 2020 150
July 2020 109
August 2020 132
September 2020 128
October 2020 190
November 2020 137
December 2020 180
January 2021 105
February 2021 144
March 2021 185
April 2021 162
May 2021 201
June 2021 189
July 2021 201
August 2021 149
September 2021 168
October 2021 156
November 2021 181
December 2021 174
January 2022 148
February 2022 161
March 2022 205
April 2022 181
May 2022 143
June 2022 197
July 2022 160
August 2022 177
September 2022 172
October 2022 167
November 2022 142
December 2022 146
January 2023 147
February 2023 144
March 2023 236
April 2023 138
May 2023 147
June 2023 146
July 2023 165
August 2023 163
September 2023 160
October 2023 147
November 2023 176
December 2023 155
January 2024 193
February 2024 202
March 2024 403
April 2024 147
May 2024 204
June 2024 153
July 2024 163
August 2024 180
September 2024 167
October 2024 124

Citations

679 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic