FRI0231 Study of urate transporters in primary hyperuricemia and gout (original) (raw)
Related papers
Microarray data analysis typically consists in identifying a list of differentially 1 expressed genes (DEG), i.e., the genes that are differentially expressed between two 2 experimental conditions. Variance shrinkage methods have been considered a better choice 3 than the standard t-test for selecting the DEG because they correct the dependence of the 4 error with the expression level. This dependence is mainly caused by errors in background 5 correction which affects more severely genes with low expression values. Here, we propose 6 a new method for identifying the DEG that overcome this issue and does not require 7 background correction or variance shrinkage. Unlike current methods, our methodology is 8 easy to understand and to implement. It consists of applying the standard t-test directly on the 9 normalized intensity data, which is possible because the probe intensity is linear dependent 10 with the gene expression level and because the t-test is scale-and location-invariant. This 11 methodology considerably improves the sensitivity and robustness of the list of DEG when 12 compared to the t-test applied to preprocessed data and to the most widely used shrinkage 13 methods, SAM and LIMMA. Our approach is useful especially when the genes of interest 14 have small differences in expression and therefore get ignored by standard variance shrinkage 15 methods.
Increased comparability between RNA-Seq and microarray data by utilization of gene sets
PLOS Computational Biology
The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms. We transformed high dimensional transcriptomics data from two different platforms into a lower dimensional, and biologically relevant dataset by calculating enrichment scores based on gene set collections for all samples. We compared the similarity between data from both platforms based on the raw data and on the enrichment scores. We show that the performed data transforms the data in a biologically relevant way and filters out noise which leads to increased platform concordance. We validate the procedure using predictive models built with microarray based enrichment scores to predict subtypes of breast cancer using enrichment scores based on sequenced data. Although microarray and RNA-Seq expression levels might appear different, transforming them into biologically relevant gene set enrichment scores significantly increases their correlation, which is a step forward in data integration of the two platforms. The gene set collections were shown to contain biologically relevant gene sets. More indepth investigation on the effect of the composition, size, and number of gene sets that are used for the transformation is suggested for future research.
Microarray-based gene set analysis: a comparison of current methods
BMC Bioinformatics, 2008
Background: The analysis of gene sets has become a popular topic in recent times, with researchers attempting to improve the interpretability and reproducibility of their microarray analyses through the inclusion of supplementary biological information. While a number of options for gene set analysis exist, no consensus has yet been reached regarding which methodology performs best, and under what conditions. The goal of this work was to examine the performance characteristics of a collection of existing gene set analysis methods, on both simulated and real microarray data sets. Of particular interest was the potential utility gained through the incorporation of inter-gene correlation into the analysis process. Results: Each of six gene set analysis methods was applied to both simulated and publicly available microarray data sets. Overall, the various methodologies were all found to be better at detecting gene sets that moved from non-active (i.e., genes not expressed) to active states (or vice versa), rather than those that simply changed their level of activity. Methods which incorporate correlation structures were found to provide increased ability to detect altered gene sets in some settings. Conclusion: Based on the results obtained through the analysis of simulated data, it is clear that the performance of gene set analysis methods is strongly influenced by the features of the data set in question, and that methods which incorporate correlation structures into the analysis process tend to achieve better performance, relative to methods which rely on univariate test statistics.
The limitations of simple gene set enrichment analysis assuming gene independence
2012
Since its first publication in 2003, the Gene Set Enrichment Analysis (GSEA) method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach, using a onesample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes GSEA's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with GSEA's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. As soon as this original version of GSEA appeared, objections were raised to the approach [10], some of which were immediately refuted in [11], and the rest were met by our subsequent improvement of the GSEA methodology. In Subramanian and Tamayo et al. [12] we introduced a version of GSEA that used a correlation-weighted Kolmogorov-Smirnov statistic, an improved enrichment normalization procedure, and an FDR-based estimate of significance that collectively made GSEA appreciably more sensitive, more general, and more robust. As a result of these improvements, and the public availability of the software and companion Molecular Signatures Database (MSigDB) [www.broadinstitute.org/gsea\], GSEA became a widely used method and was applied to numerous problems across many application domains. Notably, since the original release of the software and database in 2005, the number of GSEA user registrations has grown to over 33,000, and the method used and cited in more than 3,100 scientific publications. GSEA and other gene set analysis methods have also motivated the development of general statistical methodologies for large-scale inference for "sets" of variables [13,14]. The specific knowledge-based approach pioneered by GSEA is now standard practice in the analysis of gene expression data and inspired the development of a large and growing family of conceptually similar methods. For example, Huang et al. [15] identified at least 68 different gene set enrichment methods in their survey. A family of popular methods estimate the over-representation of Gene Ontology (GO) annotations using a hyper-geometric statistic or Fisher's exact test (e.g., GoMiner [16], FatiGO [17], GoSurfer [18], EasyGo [19], David [20]). These methods restrict consideration to the "top" of the list, and may miss more subtle signals. They also assume gene independence and thus produce overly optimistic results [15,21,22,23,24,25]. In addition, several improvements to gene set enrichment analysis itself have been proposed. These include those used in [26], GSA [27], SAFE [28], Catmap [29], ErmineJ [30], and SAM-GS [31], and PROPA [32]. They employ alternative ranking metrics, enrichment statistics, and several variations on significance estimation schemes. Notably, [33] demonstrates the difficulty of finding a single, optimal statistic due to the complexity, heterogeneity and multi-modal distribution of the expression levels of genes within gene sets. Other somewhat more sophisticated methods (e.g., FunNet [34], PARADIGM [35], COFECO [36]), take a network-based approach, but restrict the analysis to processes where a deeper understanding of gene-gene interactions is already available. The primary advantages of GSEA are that it only requires gene set membership information to compute enrichment scores, considers the entire ranked list of genes, and maintains the gene-gene dependency that reflects real biology. This yields a good compromise between sensitivity, performance and applicability. Recently, Irizarry et al. [37] in their "Gene Set Enrichment Made Simple" article proposed a "simpler" approach to gene set expression analysis assuming gene independence and using a one-sample t-test to estimate enrichment. Here we will refer to their method as SEA (Simpler Enrichment Analysis). The rationale for SEA is based on their perception that that gene independence is a reasonable simplifying assumption and thus simpler parametric approaches to gene set analysis have been ignored. Both of these assumptions disregard a large body of literature where many authors have already introduced "simple" parametric methods for gene set analysis [38,39,40,41,42]. Many researchers have demonstrated the unrealistic nature and limitations of the gene
BMC Bioinformatics
F et al: Identification of cervical cancer markers by cDNA and tissue microarrays. Cancer Res 2003, 63(8):1927-1935. 3. Malinowski DP: Multiple biomarkers in molecular oncology. II. Molecular diagnostics applications in breast cancer management. Expert Rev Mol Diagn 2007, 7(3):269-280. 4. Malinowski DP: Multiple biomarkers in molecular oncology. I. Molecular diagnostics applications in cervical cancer detection. Expert Rev Mol Diagn 2007, 7(2):117-131. 5. Olson JA, Jr.: Application of microarray profiling to clinical trials in cancer. Surgery 2004, 136(3):519-523. 6. Sun Y, Goodison S, Li J, Liu L, Farmerie W: Improved breast cancer prognosis through the combination of clinical and genetic markers. Bioinformatics 2007, 23(1):30-37. 7. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA et al: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 2009, 37(Database issue):D885-890. 8. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC et al: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001, 29(4):365-371.
Statistical intelligence: effective analysis of high-density microarray data
Drug Discovery Today, 2002
Microarrays enable researchers to interrogate thousands of genes simultaneously. A crucial step in data analysis is the selection of subsets of interesting genes from the initial set of genes. In many cases, especially when comparing genes expressed in a specific condition to a reference condition, the genes of interest are those which are differentially regulated.
How to get the most from microarray data: advice from reverse genomics
BMC Genomics, 2014
Background: Whole-genome profiling of gene expression is a powerful tool for identifying cancer-associated genes. Genes differentially expressed between normal and tumorous tissues are usually considered to be cancer associated. We recently demonstrated that the analysis of interindividual variation in gene expression can be useful for identifying cancer associated genes. The goal of this study was to identify the best microarray data-derived predictor of known cancer associated genes.
Microarray Data Analysis: General Concepts, Gene Selection, and Classification
Cerutti/Advanced, 2011
Discoveries from the genome sequencing projects facilitated the development of novel techniques able to screen thousands of molecules in parallel and identify sets of potentially interesting sequences associated with physiological/pathological conditions. As a consequence, high-throughput, large-scale experimental methodologies, combined with bioinformatics analysis of DNA, RNA, and protein data projected biological sciences into the so-called post-genomic functional genomics era. The exploration of all genes or proteins at once, in a systematic fashion, represents a sort of revolution, shifting molecular biology and medicine research from a reductionistic, hypothesis-driven approach toward deciphering how genes and their products work, how they interact in pathways within the cells, and what roles they play in health and disease (Chipping Forecast I, 1999; Chipping Forecast II, 2002). Oligonucleotide and cDNA microarrays for transcriptional profiling (Lockhart et al., 1996; Schena et al., 1995) allow measuring such interaction patterns, thus representing an unprecedented opportunity to boost the identification of diagnostic and therapeutic targets (Brownetal., 1999). The principle of a microarray for gene expression analysis is basically that of the classical northern blot extended to the whole genome level. Specifically, mRNA from a given cell line or tissue is labeled with a fluorescent dye and hybridized to a large number of DNA sequences, immobi
Post-analysis follow-up and validation of microarray experiments
Nature Genetics, 2002
the biomedical research community. Although there has been great progress in this field, investigators are still confronted with a difficult question after completing their experiments: how to validate the large data sets that are generated? This review summarizes current approaches to verifying global expression results, discusses the caveats that must be considered, and describes some methods that are being developed to address outstanding problems.