Statistical implications of pooling RNA samples for microarray experiments - PubMed (original) (raw)

Statistical implications of pooling RNA samples for microarray experiments

Xuejun Peng et al. BMC Bioinformatics. 2003.

Abstract

Background: Microarray technology has become a very important tool for studying gene expression profiles under various conditions. Biologists often pool RNA samples extracted from different subjects onto a single microarray chip to help defray the cost of microarray experiments as well as to correct for the technical difficulty in getting sufficient RNA from a single subject. However, the statistical, technical and financial implications of pooling have not been explicitly investigated.

Results: Modeling the resulting gene expression from sample pooling as a mixture of individual responses, we derived expressions for the experimental error and provided both upper and lower bounds for its value in terms of the variability among individuals and the number of RNA samples pooled. Using "virtual" pooling of data from real experiments and computer simulations, we investigated the statistical properties of RNA sample pooling. Our study reveals that pooling biological samples appropriately is statistically valid and efficient for microarray experiments. Furthermore, optimal pooling design(s) can be found to meet statistical requirements while minimizing total cost.

Conclusions: Appropriate RNA pooling can provide equivalent power and improve efficiency and cost-effectiveness for microarray experiments with a modest increase in total number of subjects. Pooling schemes in terms of replicates of subjects and arrays can be compared before experiments are conducted.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Relationships among type I error rate, sample size, effect size, and power with or without pooling. Note: n is the number of biological replicates per treatment group; c is the number of gene chips per group; a is the type I error rate; EQ means that samples are pooled with equal contribution; NE means samples do not contribute equally when pooled together (weights assigned randomly to each chip: 0.7, 0.2, and 0.1).

Figure 2

Figure 2

Approximately equivalent power curves under different pooling schemes. Power curves generated for two-sample t tests. Equal pooling assumed. Legend: n is the number of subjects per treatment group; c is the number of arrays per group. The five pooling schemes with different choices of number of subjects and number of arrays have approximately equivalent power curves when type I error rate is controlled at 0.05.

Figure 3

Figure 3

Scatter plots of the P-values with different "virtual" pooling schemes. On Y-axis are the P-values from two-sample t-tests for 8799 genes on the RGU34A gene chip with no pooling (12 subjects, 12 arrays per group). On X axis are the P-values from two-sample t-tests for pool size 2 (12 subjects, 6 arrays per group), pool size 3 (12 subjects, 4 arrays per group), and pool size 4 (12 subjects, 3 arrays per group), respectively.

Similar articles

Cited by

References

    1. Lee MLT, Kuo FC, Whitmore GA, Sklar J. Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc Nat Acad Sci. 2000;97:9834–9839. doi: 10.1073/pnas.97.18.9834. - DOI - PMC - PubMed
    1. Blalock EM, Chen KC, Sharrow K, Foster TC, Landfield PW. Gene microarray analyses of hippocampal aging: statistical profiling reveals novel expression programs correlated with cognitive impairment. Journal of Neuroscience. 2003;23:3807–3819. - PMC - PubMed
    1. Pletcher SD, Macdonald SJ, Marguerie R, Certa U, Stearens SC, Goldstein DB, Partridge L. Genome-wide transcript Profiles in aging and calorically restricted drosophila melanogaster. Current Biology. 2002;12:712–723. doi: 10.1016/S0960-9822(02)00808-4. - DOI - PubMed
    1. Miller RA, Galecki A, Shmookler-Reis RJ. Interpretation, design, and analysis of gene array expression experiments. J of Gerontol A Biol Sci Med Sci. 2001;56:B52–57. - PubMed
    1. Agrawal D, Chen T, Irby R, Quackenbush J, Chambers AF, Szabo M, Cantor A, Coppola D, Yeatman TJ. Osteopontin identified as a lead marker of colon cancer progression, using pooled sample expression profiling. J Natl Cancer Inst. 2002;94:513–521. doi: 10.1093/jnci/94.7.513. - DOI - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources