A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6 - PubMed (original) (raw)

A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6

Henrik Bengtsson et al. Bioinformatics. 2009.

Abstract

Motivation: High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs.

Results: As with our method for earlier generations of arrays, this one controls for allelic crosstalk, probe affinities and PCR fragment-length effects. Additionally, it also corrects for probe sequence effects and co-hybridization of fragments digested by multiple enzymes that takes place on the latest chips. We compare our method with Affymetrix's CN5 method and the dChip method by assessing how well they differentiate between various CN states at the full resolution and various amounts of smoothing. Although CRMA v2 is a single-array method, we observe that it performs as well as or better than alternative methods that use data from all arrays for their preprocessing. This shows that it is possible to do online analysis in large-scale projects where additional arrays are introduced over time.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

ROC curves showing that CRMA v2 (solid red) separates CN = 1 from CN = 2 (ChrX) better than CN5 (dashed blue) and dChip* (solid light blue) at the full resolution (H = 1; A) as well as at various amounts of smoothing (H = 1, 2, 3, 4; B). The curves for H = 1 are in the lower right corner and the curves for H = 4 are in the upper left corner.

Fig. 2.

Fig. 2.

The true-positive rate as a function of resolution/smoothing at a 2.0% false-positive rate for the different methods. The results for the CN = 2 versus CN = 1 (ChrX) test is depicted in (A) and the results for the CN = 1 versus CN = 0 (ChrY) test in (B). Note the different scales. See Figure 1 for legends.

Fig. 3.

Fig. 3.

ROC curves showing CRMA v2 differentiates between CN = 1 and CN = 0 (ChrY) as well as or slightly worse than CN5, and better than dChip* at the full resolution (A) as well as at various amounts of smoothing (B). See Figure 1 for legends.

Fig. 4.

Fig. 4.

The methods' performances on SNPs (left) and CN units (right) when testing for CN = 2 versus CN = 1 (ChrX; upper) and CN = 1 versus CN = 0 (ChrY; lower). The panels show the ROC curves for CRMA v2 (solid red), CN5 (dashed blue) and dChip* (solid light blue) at H = 1, 2, 3, 4 amounts of smoothing.

Fig. 5.

Fig. 5.

Distribution of true-positive rates for SNPs (A and C) and CN units (B and D) for CRMA v2 (left bars; red), CN5 (middle bars; blue) and dChip* (right bars; light blue) when testing for CN = 2 versus CN = 1 (ChrX; A and B) and CN = 1 versus CN = 0 (ChrY; C and D) while fixing the false-positive rate (3.45%). No smoothing was applied.

Fig. 6.

Fig. 6.

The region 100.1–107.5 Mb on Chr 1 in tumor-normal sample HCC1143 has a change point at ∼103.8 Mb, which separates a copy-neutral state (left) from a loss (right). There are 2242 and 2074 loci in these two states, respectively (totaling 4316 loci). The top three rows show the raw CNs [Equation (14)] of the CRMA v2, the dChip and the CN5 methods, respectively. The 500 kb safety region around the change point with data points excluded in the evaluation is highlighted by a dashed frame. The three panels in the bottom row show the ROC performance of the three methods at the full resolution, and after binning the CNs in non-overlapping windows of size 5 and 20 kb, respectively. See Figure 4 for legends.

Fig. 7.

Fig. 7.

The region 61.0–69.0Mb on Chr 10 in tumor-normal sample HCC1143 has a change point at ∼65.3 Mb, which separates a gain (left) from a copy-neutral state (right). There are 2805 and 2480 loci in these two states, respectively (totaling 5285 loci). See Figure 6 for content and legends as in.

Similar articles

Cited by

References

    1. Affymetrix Inc. Genome-Wide Human SNP Nsp/Sty 6.0 User Guide. 2007a Affymetrix Inc. Rev 1. Available at http://www.affymetrix.com/
    1. Affymetrix Inc. Genome-Wide Human SNP Nsp/Sty Assay 5.0. 2007b Affymetrix Inc. Rev 2. Available at http://www.affymetrix.com/
    1. Affymetrix Inc. Affymetrix Genotyping Console 3.0 - User Manual. 2008 Affymetrix Inc. Available at http://www.affymetrix.com/
    1. Altshuler D, et al. A haplotype map of the human genome. Nature. 2005;437:1299–1320. - PMC - PubMed
    1. Bengtsson H, et al. Calibration and assessment of channel-specific biases in microarray data with extended dynamical range. BMC Bioinformatics. 2004;5:177. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources