Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data - PubMed (original) (raw)
Comparative Study
Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data
Weil R Lai et al. Bioinformatics. 2005.
Abstract
Motivation: Array Comparative Genomic Hybridization (CGH) can reveal chromosomal aberrations in the genomic DNA. These amplifications and deletions at the DNA level are important in the pathogenesis of cancer and other diseases. While a large number of approaches have been proposed for analyzing the large array CGH datasets, the relative merits of these methods in practice are not clear.
Results: We compare 11 different algorithms for analyzing array CGH data. These include both segment detection methods and smoothing methods, based on diverse techniques such as mixture models, Hidden Markov Models, maximum likelihood, regression, wavelets and genetic algorithms. We compute the Receiver Operating Characteristic (ROC) curves using simulated data to quantify sensitivity and specificity for various levels of signal-to-noise ratio and different sizes of abnormalities. We also characterize their performance on chromosomal regions of interest in a real dataset obtained from patients with Glioblastoma Multiforme. While comparisons of this type are difficult due to possibly sub-optimal choice of parameters in the methods, they nevertheless reveal general characteristics that are helpful to the biological investigator.
Figures
Fig. 1
Array-CGH algorithms on simulated aberrations of increasing width. Illustrated here as an example are the signal profiles consisting of five aberrations of 2, 5, 10, 20, and 40 probes long with an amplitude of 1. Gaussian noise N(0, .252) was added onto the signal profile to generate the simulated data. Default settings for the algorithms were used when available; otherwise, appropriate parameters were selected or computed based on the program documentation and related papers.
Fig. 2
Receiver operating characteristic (ROC) curves for array CGH algorithms measured at different aberration widths and signal-to-noise ratios (SNR). The _x_-axis is the false positive rate and the _y_-axis is the true positive rate. Red is CGHseg (Picard et al., 2005), orange is quantreg (Eilers and de Menezes, 2005), dark yellow is CLAC (Wang et al., 2005), green is GLAD (Hupe et al., 2004), blue is CBS (Olshen et al., 2004), violet is HMM (Fridlyand et al., 2004), salmon is wavelet (Hsu et al., 2005), black is lowess, light green is ChARM (Myers et al., 2004), brown is GA (Jong et al., 2003), and cyan is ACE (Lingjaerde et al., 2005). The curves were generated by measuring the true and false positive rates on simulated data at different threshold levels.
Fig. 3
Array-CGH profile of chromosome 13 in a Glioblastoma Multiforme sample (GBM31). This chromosome has a partial loss of low magnitude. Most algorithms in the study detect the loss. In particular, CGHseg, GLAD, CBS, and GA clearly identify the region.
Fig. 4
Array-CGH profile of the three amplifications around EGFR in GBM29. CGHseg, quantreg, GLAD, wavelet, and GA detects all three amplifications. CLAC, CBS, Lowess, and ACE detect the first two amplifications as one larger region. ChARM detects the amplification as one large region of gain, while HMM does not detect any.
Similar articles
- Quantile smoothing of array CGH data.
Eilers PH, de Menezes RX. Eilers PH, et al. Bioinformatics. 2005 Apr 1;21(7):1146-53. doi: 10.1093/bioinformatics/bti148. Epub 2004 Nov 30. Bioinformatics. 2005. PMID: 15572474 - High-resolution mapping of amplifications and deletions in pediatric osteosarcoma by use of CGH analysis of cDNA microarrays.
Squire JA, Pei J, Marrano P, Beheshti B, Bayani J, Lim G, Moldovan L, Zielenska M. Squire JA, et al. Genes Chromosomes Cancer. 2003 Nov;38(3):215-25. doi: 10.1002/gcc.10273. Genes Chromosomes Cancer. 2003. PMID: 14506695 - Accurate detection of aneuploidies in array CGH and gene expression microarray data.
Myers CL, Dunham MJ, Kung SY, Troyanskaya OG. Myers CL, et al. Bioinformatics. 2004 Dec 12;20(18):3533-43. doi: 10.1093/bioinformatics/bth440. Epub 2004 Jul 29. Bioinformatics. 2004. PMID: 15284100 - Microarray-based comparative genomic hybridization and its applications in human genetics.
Oostlander AE, Meijer GA, Ylstra B. Oostlander AE, et al. Clin Genet. 2004 Dec;66(6):488-95. doi: 10.1111/j.1399-0004.2004.00322.x. Clin Genet. 2004. PMID: 15521975 Review. - [Microarray-based comparative genomic hybridization in the study of constitutional chromosomal abnormalities].
Béri-Dexheimer M, Bonnet C, Chambon P, Brochet K, Grégoire MJ, Jonveaux P. Béri-Dexheimer M, et al. Pathol Biol (Paris). 2007 Feb;55(1):13-8. doi: 10.1016/j.patbio.2006.04.002. Epub 2006 May 11. Pathol Biol (Paris). 2007. PMID: 16697120 Review. French.
Cited by
- CONY: A Bayesian procedure for detecting copy number variations from sequencing read depths.
Wei YC, Huang GH. Wei YC, et al. Sci Rep. 2020 Jun 26;10(1):10493. doi: 10.1038/s41598-020-64353-1. Sci Rep. 2020. PMID: 32591545 Free PMC article. - HiNT: a computational method for detecting copy number variations and translocations from Hi-C data.
Wang S, Lee S, Chu C, Jain D, Kerpedjiev P, Nelson GM, Walsh JM, Alver BH, Park PJ. Wang S, et al. Genome Biol. 2020 Mar 23;21(1):73. doi: 10.1186/s13059-020-01986-5. Genome Biol. 2020. PMID: 32293513 Free PMC article. - The coexistence of copy number variations (CNVs) and single nucleotide polymorphisms (SNPs) at a locus can result in distorted calculations of the significance in associating SNPs to disease.
Liu J, Zhou Y, Liu S, Song X, Yang XZ, Fan Y, Chen W, Akdemir ZC, Yan Z, Zuo Y, Du R, Liu Z, Yuan B, Zhao S, Liu G, Chen Y, Zhao Y, Lin M, Zhu Q, Niu Y, Liu P, Ikegawa S, Song YQ, Posey JE, Qiu G; DISCO (Deciphering disorders Involving Scoliosis and COmorbidities) Study; Zhang F, Wu Z, Lupski JR, Wu N. Liu J, et al. Hum Genet. 2018 Jul;137(6-7):553-567. doi: 10.1007/s00439-018-1910-3. Epub 2018 Jul 17. Hum Genet. 2018. PMID: 30019117 Free PMC article. - iSeg: an efficient algorithm for segmentation of genomic and epigenomic data.
Girimurugan SB, Liu Y, Lung PY, Vera DL, Dennis JH, Bass HW, Zhang J. Girimurugan SB, et al. BMC Bioinformatics. 2018 Apr 11;19(1):131. doi: 10.1186/s12859-018-2140-3. BMC Bioinformatics. 2018. PMID: 29642840 Free PMC article. - ALLELE-SPECIFIC COPY NUMBER ESTIMATION BY WHOLE EXOME SEQUENCING.
Chen H, Jiang Y, Maxwell KN, Nathanson KL, Zhang N. Chen H, et al. Ann Appl Stat. 2017 Jun;11(2):1169-1192. doi: 10.1214/17-AOAS1043. Epub 2017 Jul 20. Ann Appl Stat. 2017. PMID: 28989557 Free PMC article.
References
- Autio R, Hautaniemi S, Kauraniemi P, Yli-Harja O, Astola J, Wolf M, Kallioniemi A. CGH-Plotter: MATLAB toolbox for CGH-data analysis. Bioinformatics. 2003;19:1714–1715. - PubMed
- Bredel M, Bredel C, Juric D, Harsh GR, Vogel H, Recht LD, Sikic BI. High-resolution genome-wide mapping of genetic alterations in human glial brain tumors. Cancer Res. 2005;65:4088–4096. - PubMed
- Brennan C, Zhang Y, Leo C, Feng B, Cauwels C, Aguirre AJ, Kim M, Protopopov A, Chin L. High-resolution global profiling of genomic alterations with long oligonucleotide microarray. Cancer Res. 2004;64:4744–4748. - PubMed
- Cleveland WS. Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 1979;74:829–836.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources