Summaries of Affymetrix GeneChip probe level data - PubMed (original) (raw)
Summaries of Affymetrix GeneChip probe level data
Rafael A Irizarry et al. Nucleic Acids Res. 2003.
Abstract
High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11-20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike-in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.
Figures
Figure 1
The smooth curves shown were fitted to the scatter plots of SD versus average of log (base 2) expression for each gene using MAS 5.0, dChip and RMA on the dilution data. All genes for all six concentrations in liver and CNS groups were used.
Figure 2
(A) Log (base 2) fold change estimates of gene expression between liver and CNS samples computed from arrays hybridized to 1.25 µg of cRNA using MAS 5.0 plotted against the same estimates obtained from arrays hybridized to 20 µg. Genes demonstrating 2- to 3-fold inconsistencies are shown with squares. Genes demonstrating inconsistencies larger than 3-fold are shown with circles. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 2
(A) Log (base 2) fold change estimates of gene expression between liver and CNS samples computed from arrays hybridized to 1.25 µg of cRNA using MAS 5.0 plotted against the same estimates obtained from arrays hybridized to 20 µg. Genes demonstrating 2- to 3-fold inconsistencies are shown with squares. Genes demonstrating inconsistencies larger than 3-fold are shown with circles. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 2
(A) Log (base 2) fold change estimates of gene expression between liver and CNS samples computed from arrays hybridized to 1.25 µg of cRNA using MAS 5.0 plotted against the same estimates obtained from arrays hybridized to 20 µg. Genes demonstrating 2- to 3-fold inconsistencies are shown with squares. Genes demonstrating inconsistencies larger than 3-fold are shown with circles. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 4
MvA plots (described in the text) for Affymetrix’s spike-in experiment. (A) For MAS 5.0, observed log (base 2) fold change (M) is plotted against average log (base 2) expression (A) for all genes from spike-in experiment array pairs. A reference array was selected from one of the replicate spike-in experiments and compared to all other arrays in that replicate experiment. The colored numbers represent the log (base 2) fold change in concentrations of all 14 spiked-in genes. Each distinct fold change is represented with a different color as a visual aid. The –∞ and ∞ represent fold changes with a zero in the numerator or denominator, respectively. The red points represent non-spiked-in genes with a fold change larger than 2. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 4
MvA plots (described in the text) for Affymetrix’s spike-in experiment. (A) For MAS 5.0, observed log (base 2) fold change (M) is plotted against average log (base 2) expression (A) for all genes from spike-in experiment array pairs. A reference array was selected from one of the replicate spike-in experiments and compared to all other arrays in that replicate experiment. The colored numbers represent the log (base 2) fold change in concentrations of all 14 spiked-in genes. Each distinct fold change is represented with a different color as a visual aid. The –∞ and ∞ represent fold changes with a zero in the numerator or denominator, respectively. The red points represent non-spiked-in genes with a fold change larger than 2. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 4
MvA plots (described in the text) for Affymetrix’s spike-in experiment. (A) For MAS 5.0, observed log (base 2) fold change (M) is plotted against average log (base 2) expression (A) for all genes from spike-in experiment array pairs. A reference array was selected from one of the replicate spike-in experiments and compared to all other arrays in that replicate experiment. The colored numbers represent the log (base 2) fold change in concentrations of all 14 spiked-in genes. Each distinct fold change is represented with a different color as a visual aid. The –∞ and ∞ represent fold changes with a zero in the numerator or denominator, respectively. The red points represent non-spiked-in genes with a fold change larger than 2. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 5
Box plots showing the distribution of observed fold changes for non-spiked in genes. The different colors represent the different quantiles. The relationship of color and quantile is demonstrated in the first box from the left.
Similar articles
- Exploration, normalization, and summaries of high density oligonucleotide array probe level data.
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Irizarry RA, et al. Biostatistics. 2003 Apr;4(2):249-64. doi: 10.1093/biostatistics/4.2.249. Biostatistics. 2003. PMID: 12925520 - A benchmark for Affymetrix GeneChip expression measures.
Cope LM, Irizarry RA, Jaffee HA, Wu Z, Speed TP. Cope LM, et al. Bioinformatics. 2004 Feb 12;20(3):323-31. doi: 10.1093/bioinformatics/btg410. Bioinformatics. 2004. PMID: 14960458 - A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat.
Alberts R, Terpstra P, Hardonk M, Bystrykh LV, de Haan G, Breitling R, Nap JP, Jansen RC. Alberts R, et al. BMC Bioinformatics. 2007 Apr 20;8:132. doi: 10.1186/1471-2105-8-132. BMC Bioinformatics. 2007. PMID: 17448222 Free PMC article. - Quality assessment of Affymetrix GeneChip data.
Heber S, Sick B. Heber S, et al. OMICS. 2006 Fall;10(3):358-68. doi: 10.1089/omi.2006.10.358. OMICS. 2006. PMID: 17069513 Review. - Custom microarray for glycobiologists: considerations for glycosyltransferase gene expression profiling.
Comelli EM, Amado M, Head SR, Paulson JC. Comelli EM, et al. Biochem Soc Symp. 2002;(69):135-42. doi: 10.1042/bss0690135. Biochem Soc Symp. 2002. PMID: 12655780 Review.
Cited by
- Alphacoronavirus protein 7 modulates host innate immune response.
Cruz JL, Becares M, Sola I, Oliveros JC, Enjuanes L, Zúñiga S. Cruz JL, et al. J Virol. 2013 Sep;87(17):9754-67. doi: 10.1128/JVI.01032-13. Epub 2013 Jul 3. J Virol. 2013. PMID: 23824792 Free PMC article. - Utilization of never-medicated bipolar disorder patients towards development and validation of a peripheral biomarker profile.
Clelland CL, Read LL, Panek LJ, Nadrich RH, Bancroft C, Clelland JD. Clelland CL, et al. PLoS One. 2013 Jun 24;8(6):e69082. doi: 10.1371/journal.pone.0069082. Print 2013. PLoS One. 2013. PMID: 23826396 Free PMC article. - A Bayesian Approach for Learning Gene Networks Underlying Disease Severity in COPD.
Shaddox E, Stingo FC, Peterson CB, Jacobson S, Cruickshank-Quinn C, Kechris K, Bowler R, Vannucci M. Shaddox E, et al. Stat Biosci. 2018;10(1):59-85. doi: 10.1007/s12561-016-9176-6. Epub 2016 Oct 28. Stat Biosci. 2018. PMID: 33912251 Free PMC article. - SRC-2 coactivator deficiency decreases functional reserve in response to pressure overload of mouse heart.
Reineke EL, York B, Stashi E, Chen X, Tsimelzon A, Xu J, Newgard CB, Taffet GE, Taegtmeyer H, Entman ML, O'Malley BW. Reineke EL, et al. PLoS One. 2012;7(12):e53395. doi: 10.1371/journal.pone.0053395. Epub 2012 Dec 31. PLoS One. 2012. PMID: 23300926 Free PMC article. - Down-Regulation of TLR and JAK/STAT Pathway Genes Is Associated with Diffuse Cutaneous Leishmaniasis: A Gene Expression Analysis in NK Cells from Patients Infected with Leishmania mexicana.
Fernández-Figueroa EA, Imaz-Rosshandler I, Castillo-Fernández JE, Miranda-Ortíz H, Fernández-López JC, Becker I, Rangel-Escareño C. Fernández-Figueroa EA, et al. PLoS Negl Trop Dis. 2016 Mar 31;10(3):e0004570. doi: 10.1371/journal.pntd.0004570. eCollection 2016 Mar. PLoS Negl Trop Dis. 2016. PMID: 27031998 Free PMC article.
References
- Lockhart D., Dong,H., Byrne,M., Follettie,M., Gallo,M., Chee M., Mittmann,M., Wang,C., Kobayashi,M., Horton,H. et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol., 14, 1675–1680. - PubMed
- Lipshutz R., Fodor,S., Gingeras,T. and Lockhart D. (1999) High density synthetic oligonucleotide arrays. Nature Genet., Suppl. 21, 20–24. - PubMed
- Affymetrix (1999) Microarray Suite User Guide, Version 4. Affymetrix, http://www.affymetrix.com/support/technical/manuals.affx.
- Irizarry R., Hobbs,B., Collin,F., Beazer-Barclay,Y., Antonellis,K., Scherf,U. and Speed,T. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, in press. - PubMed
- Affymetrix (2001) Microarray Suite User Guide, Version 5. Affymetrix, http://www.affymetrix.com/support/technical/manuals.affx.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases