Summaries of Affymetrix GeneChip probe level data - PubMed (original) (raw)
Summaries of Affymetrix GeneChip probe level data
Rafael A Irizarry et al. Nucleic Acids Res. 2003.
Abstract
High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11-20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike-in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.
Figures
Figure 1
The smooth curves shown were fitted to the scatter plots of SD versus average of log (base 2) expression for each gene using MAS 5.0, dChip and RMA on the dilution data. All genes for all six concentrations in liver and CNS groups were used.
Figure 2
(A) Log (base 2) fold change estimates of gene expression between liver and CNS samples computed from arrays hybridized to 1.25 µg of cRNA using MAS 5.0 plotted against the same estimates obtained from arrays hybridized to 20 µg. Genes demonstrating 2- to 3-fold inconsistencies are shown with squares. Genes demonstrating inconsistencies larger than 3-fold are shown with circles. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 2
(A) Log (base 2) fold change estimates of gene expression between liver and CNS samples computed from arrays hybridized to 1.25 µg of cRNA using MAS 5.0 plotted against the same estimates obtained from arrays hybridized to 20 µg. Genes demonstrating 2- to 3-fold inconsistencies are shown with squares. Genes demonstrating inconsistencies larger than 3-fold are shown with circles. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 2
(A) Log (base 2) fold change estimates of gene expression between liver and CNS samples computed from arrays hybridized to 1.25 µg of cRNA using MAS 5.0 plotted against the same estimates obtained from arrays hybridized to 20 µg. Genes demonstrating 2- to 3-fold inconsistencies are shown with squares. Genes demonstrating inconsistencies larger than 3-fold are shown with circles. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 3
(Previous page and above) ROC curves for spike-in experiments. (A) For 10 pairs of arrays, chosen at random from the Affymetrix spike-in experiment, true positive rates (sensitivity) are estimated for the filtering operation, Observed Fold Change > cut-off, for a large range of cut-off values, by calculating the proportion of genes spiked-in at different concentrations that satisfy the filtering criterion. False positive rates (1 – specificity) are calculated in a similar way by computing the proportion of non-spiked-in genes, which satisfy the filtering criteria. (B) As (A) but using the GeneLogic spike-in experiment. (C) As (A) but selecting 10 comparisons for which the fold changes of spike-in concentrations are 2. (D) As (A) but using the filtering operation test statistic > cut-off. We used the software default test statistics for MAS 5.0 and dChip. (E) As (D) but using the GeneLogic spike-in experiment. (F) As (A) but comparing the average fold changes obtained from two sets of 12 replicate arrays.
Figure 4
MvA plots (described in the text) for Affymetrix’s spike-in experiment. (A) For MAS 5.0, observed log (base 2) fold change (M) is plotted against average log (base 2) expression (A) for all genes from spike-in experiment array pairs. A reference array was selected from one of the replicate spike-in experiments and compared to all other arrays in that replicate experiment. The colored numbers represent the log (base 2) fold change in concentrations of all 14 spiked-in genes. Each distinct fold change is represented with a different color as a visual aid. The –∞ and ∞ represent fold changes with a zero in the numerator or denominator, respectively. The red points represent non-spiked-in genes with a fold change larger than 2. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 4
MvA plots (described in the text) for Affymetrix’s spike-in experiment. (A) For MAS 5.0, observed log (base 2) fold change (M) is plotted against average log (base 2) expression (A) for all genes from spike-in experiment array pairs. A reference array was selected from one of the replicate spike-in experiments and compared to all other arrays in that replicate experiment. The colored numbers represent the log (base 2) fold change in concentrations of all 14 spiked-in genes. Each distinct fold change is represented with a different color as a visual aid. The –∞ and ∞ represent fold changes with a zero in the numerator or denominator, respectively. The red points represent non-spiked-in genes with a fold change larger than 2. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 4
MvA plots (described in the text) for Affymetrix’s spike-in experiment. (A) For MAS 5.0, observed log (base 2) fold change (M) is plotted against average log (base 2) expression (A) for all genes from spike-in experiment array pairs. A reference array was selected from one of the replicate spike-in experiments and compared to all other arrays in that replicate experiment. The colored numbers represent the log (base 2) fold change in concentrations of all 14 spiked-in genes. Each distinct fold change is represented with a different color as a visual aid. The –∞ and ∞ represent fold changes with a zero in the numerator or denominator, respectively. The red points represent non-spiked-in genes with a fold change larger than 2. (B) As (A) but using dChip. (C) As (A) but using RMA.
Figure 5
Box plots showing the distribution of observed fold changes for non-spiked in genes. The different colors represent the different quantiles. The relationship of color and quantile is demonstrated in the first box from the left.
Similar articles
- Exploration, normalization, and summaries of high density oligonucleotide array probe level data.
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Irizarry RA, et al. Biostatistics. 2003 Apr;4(2):249-64. doi: 10.1093/biostatistics/4.2.249. Biostatistics. 2003. PMID: 12925520 - A benchmark for Affymetrix GeneChip expression measures.
Cope LM, Irizarry RA, Jaffee HA, Wu Z, Speed TP. Cope LM, et al. Bioinformatics. 2004 Feb 12;20(3):323-31. doi: 10.1093/bioinformatics/btg410. Bioinformatics. 2004. PMID: 14960458 - A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat.
Alberts R, Terpstra P, Hardonk M, Bystrykh LV, de Haan G, Breitling R, Nap JP, Jansen RC. Alberts R, et al. BMC Bioinformatics. 2007 Apr 20;8:132. doi: 10.1186/1471-2105-8-132. BMC Bioinformatics. 2007. PMID: 17448222 Free PMC article. - Quality assessment of Affymetrix GeneChip data.
Heber S, Sick B. Heber S, et al. OMICS. 2006 Fall;10(3):358-68. doi: 10.1089/omi.2006.10.358. OMICS. 2006. PMID: 17069513 Review. - Custom microarray for glycobiologists: considerations for glycosyltransferase gene expression profiling.
Comelli EM, Amado M, Head SR, Paulson JC. Comelli EM, et al. Biochem Soc Symp. 2002;(69):135-42. doi: 10.1042/bss0690135. Biochem Soc Symp. 2002. PMID: 12655780 Review.
Cited by
- The ion channel TRPA1 is required for chronic itch.
Wilson SR, Nelson AM, Batia L, Morita T, Estandian D, Owens DM, Lumpkin EA, Bautista DM. Wilson SR, et al. J Neurosci. 2013 May 29;33(22):9283-94. doi: 10.1523/JNEUROSCI.5318-12.2013. J Neurosci. 2013. PMID: 23719797 Free PMC article. - Identification of Orch3, a locus controlling dominant resistance to autoimmune orchitis, as kinesin family member 1C.
del Rio R, McAllister RD, Meeker ND, Wall EH, Bond JP, Kyttaris VC, Tsokos GC, Tung KS, Teuscher C. del Rio R, et al. PLoS Genet. 2012;8(12):e1003140. doi: 10.1371/journal.pgen.1003140. Epub 2012 Dec 27. PLoS Genet. 2012. PMID: 23300462 Free PMC article. - Maternal obesity programs mitochondrial and lipid metabolism gene expression in infant umbilical vein endothelial cells.
Costa SM, Isganaitis E, Matthews TJ, Hughes K, Daher G, Dreyfuss JM, da Silva GA, Patti ME. Costa SM, et al. Int J Obes (Lond). 2016 Nov;40(11):1627-1634. doi: 10.1038/ijo.2016.142. Epub 2016 Aug 17. Int J Obes (Lond). 2016. PMID: 27531045 Free PMC article. - Tissue-engineered fetal dermal matrices.
Pouyani T, Papp S, Schaffer L. Pouyani T, et al. In Vitro Cell Dev Biol Anim. 2012 Sep;48(8):493-506. doi: 10.1007/s11626-012-9541-9. Epub 2012 Sep 6. In Vitro Cell Dev Biol Anim. 2012. PMID: 22956043 - Comparative Analysis of Shapley Values Enhances Transcriptomics Insights across Some Common Uterine Pathologies.
Castro-Martínez JA, Vargas E, Díaz-Beltrán L, Esteban FJ. Castro-Martínez JA, et al. Genes (Basel). 2024 Jun 1;15(6):723. doi: 10.3390/genes15060723. Genes (Basel). 2024. PMID: 38927658 Free PMC article.
References
- Lockhart D., Dong,H., Byrne,M., Follettie,M., Gallo,M., Chee M., Mittmann,M., Wang,C., Kobayashi,M., Horton,H. et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol., 14, 1675–1680. - PubMed
- Lipshutz R., Fodor,S., Gingeras,T. and Lockhart D. (1999) High density synthetic oligonucleotide arrays. Nature Genet., Suppl. 21, 20–24. - PubMed
- Affymetrix (1999) Microarray Suite User Guide, Version 4. Affymetrix, http://www.affymetrix.com/support/technical/manuals.affx.
- Irizarry R., Hobbs,B., Collin,F., Beazer-Barclay,Y., Antonellis,K., Scherf,U. and Speed,T. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, in press. - PubMed
- Affymetrix (2001) Microarray Suite User Guide, Version 5. Affymetrix, http://www.affymetrix.com/support/technical/manuals.affx.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources