Post-analysis follow-up and validation of microarray experiments (original) (raw)

Repeatability of published microarray gene expression analyses

Nature Genetics, 2009

Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005-2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.

Cross-study validation and combined analysis of gene expression microarray data

Biostatistics, 2007

Investigations of transcript levels on a genomic scale using hybridization-based arrays have led to formidable advances in our understanding of the biology of many human illnesses. At the same time, these investigations have generated controversy because of the probabilistic nature of the conclusions and the surfacing of noticeable discrepancies between the results of studies addressing the same biological question. In this article, we present simple and effective data analysis and visualization tools for gauging the degree to which the findings of one study are reproduced by others and for integrating multiple studies in a single analysis. We describe these approaches in the context of studies of breast cancer and illustrate that it is possible to identify a substantial biologically relevant subset of the human genome within which hybridization results are reliable. The subset generally varies with the platforms used, the tissues studied, and the populations being sampled. Despite important differences, it is also possible to develop simple expression measures that allow comparison across platforms, studies, laboratories and populations. Important biological signals are often preserved or enhanced. Cross-study validation and combination of microarray results requires careful, but not overly complex, statistical thinking and can become a routine component of genomic analysis.

Meta-Analysis of Microarrays: Interstudy Validation of Gene Expression Profiles Reveals Pathway Dysregulation in Prostate Cancer1

2002

The increasing availability and maturity of DNA microarray technology has led to an explosion of cancer profiling studies. To extract maximum value from the accumulating mass of publicly available cancer gene expression data, methods are needed to evaluate, integrate, and intervalidate multiple datasets. Here we demonstrate a statistical model for performing meta-analysis of independent microarray datasets. Implementation of this model revealed that four prostate cancer gene expression datasets shared significantly similar results, independent of the method and technology used (i.e., spotted cDNA versus oligonucleotide). This interstudy cross-validation approach generated a cohort of genes that were consistently and significantly dysregulated in prostate cancer. Bioinformatic investigation of these genes revealed a synchronous network of transcriptional regulation in the polyamine and purine biosynthesis pathways. Beyond the specific implications for prostate cancer, this work establishes a much-needed model for the evaluation, cross-validation, and comparison of multiple cancer profiling studies.

Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer

Cancer research, 2002

The increasing availability and maturity of DNA microarray technology has led to an explosion of cancer profiling studies. To extract maximum value from the accumulating mass of publicly available cancer gene expression data, methods are needed to evaluate, integrate, and intervalidate multiple datasets. Here we demonstrate a statistical model for performing meta-analysis of independent microarray datasets. Implementation of this model revealed that four prostate cancer gene expression datasets shared significantly similar results, independent of the method and technology used (i.e., spotted cDNA versus oligonucleotide). This interstudy cross-validation approach generated a cohort of genes that were consistently and significantly dysregulated in prostate cancer. Bioinformatic investigation of these genes revealed a synchronous network of transcriptional regulation in the polyamine and purine biosynthesis pathways. Beyond the specific implications for prostate cancer, this work estab...

A methodology for global validation of microarray experiments

2006

Background: DNA microarrays are popular tools for measuring gene expression of biological samples. This ever increasing popularity is ensuring that a large number of microarray studies are conducted, many of which with data publicly available for mining by other investigators. Under most circumstances, validation of differential expression of genes is performed on a gene to gene basis. Thus, it is not possible to generalize validation results to the remaining majority of non-validated genes or to evaluate the overall quality of these studies.

Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies

BMC Bioinformatics

F et al: Identification of cervical cancer markers by cDNA and tissue microarrays. Cancer Res 2003, 63(8):1927-1935. 3. Malinowski DP: Multiple biomarkers in molecular oncology. II. Molecular diagnostics applications in breast cancer management. Expert Rev Mol Diagn 2007, 7(3):269-280. 4. Malinowski DP: Multiple biomarkers in molecular oncology. I. Molecular diagnostics applications in cervical cancer detection. Expert Rev Mol Diagn 2007, 7(2):117-131. 5. Olson JA, Jr.: Application of microarray profiling to clinical trials in cancer. Surgery 2004, 136(3):519-523. 6. Sun Y, Goodison S, Li J, Liu L, Farmerie W: Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics 2007, 23(1):30-37. 7. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA et al: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 2009, 37(Database issue):D885-890. 8. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC et al: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001, 29(4):365-371.

Microarrays as Validation Strategies in Clinical Samples: Tissue and Protein Microarrays

OMICS: A Journal of Integrative Biology, 2006

The widespread use of DNA microarrays has led to the discovery of many genes whose expression profile may have significant clinical relevance. The translation of this data to the bedside requires that gene expression be validated as protein expression, and that annotated clinical samples be available for correlative and quantitative studies to assess clinical context and usefulness of putative biomarkers. We review two microarray platforms developed to facilitate the clinical validation of candidate biomarkers: tissue microarrays and reverse-phase protein microarrays. Tissue microarrays are arrays of core biopsies obtained from paraffinembedded tissues, which can be assayed for histologically-specific protein expression by immunohistochemistry. Reverse-phase protein microarrays consist of arrays of cell lysates or, more recently, plasma or serum samples, which can be assayed for protein quantity and for the presence of post-translational modifications such as phosphorylation. Although these platforms are limited by the availability of validated antibodies, both enable the preservation of precious clinical samples as well as experimental standardization in a high-throughput manner proper to microarray technologies. While tissue microarrays are rapidly becoming a mainstay of translational research, reverse-phase protein microarrays require further technical refinements and validation prior to their widespread adoption by research laboratories.

Statistical issues and methods for meta-analysis of microarray data: a case study in prostate cancer

Functional & Integrative Genomics, 2003

With the proliferation of related microarray studies by independent groups, a natural step in the analysis of these gene expression data is to combine the results across these studies. However, this raises a variety of issues in the analysis of such data. In this article, we discuss the statistical issues of combining data from multiple gene expression studies. This leads to more complications than those in standard meta-analyses, including different experimental platforms, duplicate spots and complex data structures. We illustrate these ideas using data from four prostate cancer profiling studies. In addition, we develop a simple approach for assessing differential expression using the LASSO method. A combination of the results and the pathway databases are then used to generate candidate biological pathways for cancer.

The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies

Nature Precedings, 2007

Reproducibility is a fundamental requirement in scientific experiments and clinical contexts. Recent publications raise concerns about the reliability of microarray technology because of the apparent lack of agreement between lists of differentially expressed genes (DEGs). In this study we demonstrate that (1) such discordance may stem from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion, the lists become much more reproducible, especially when fewer genes are selected; and (3) the instability of short DEG lists based on P cutoffs is an expected mathematical consequence of the high variability of the t-values. We recommend the use of FC ranking plus a non-stringent P cutoff as a baseline practice in order to generate more reproducible DEG lists. The FC criterion enhances reproducibility while the P criterion balances sensitivity and specificity.