PASI: A novel pathway method to identify delicate group effects (original) (raw)
Related papers
Observer-Biased Analysis of Gene Expression Profiles
means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
New Normalization Method and Error Analysis for Gene Expression Microarray Data
2000
The recent development of complementary DNA microarray technology provides a powerful analytical tool for genetic research. This tool allows one to study expression levels in parallel which represents an enormous gain in terms of experimental time invested. But always while carrying out comparisons of expression data using measurements from a single array or multiple arrays, the question of normalizing data arises. An essential answer is needed to be established to the following question: Does the variation in the intensity data represent true variation in the expression level of the genes present in the analysis or is this variation the result of experimental variability? If the problem addressed by this question is not resolved, any further type of data mining process on the data is worthless. The role of normalization is to separate true variation in expression values from di erences due to experimental variability. As mentioned before the microarray technology is a powerful tool that permits to study the expression level of thousands of genes at the same time, but usually in an experiment involving thousands of genes there are only few genes that are really of interest, those genes that overexpress as a \response"to the experiment practiced. Therefore it is useful to develop a method that will provide statistical information about these genes avoiding the processing of the data on the whole set of genes. In this technical memo it is proposed a new normalization method and error analysis that ultimately will provide the scientist with a statistical tool that wil allow to focus on a considerably reduced subset of genes from an originally much larger dataset. The usefullness of this type of reduction is justi ed under the scope of further analysis as for instance subsequent clustering techniques applied over the reduced data set.
A Comparative Study of Methods of Analyzing Gene Expression Data
2004
In analyzing microarray data it is often necessary to detect genes that are differentially expressed between two or more samples. This project aims to apply two methods to address statistical issues that arise when identifying differentially expressed genes. The first is a geneby-gene analysis that attempts to overcome the small sample size issue that is often present in microarray data sets. By averaging the variances of genes with similar expression levels, we are able to stabilize the test statistics used in determining significant genes and obtain more powerful tests. When looking at thousands of tests, one for each gene, problems arise involving the type I error rates. The leads to multiple testing issues that must be addressed. We applied many methods of correcting or adjusting the p-values for multiple testing. Based on this study, the false discovery rate method appears to provide a reasonable balance between the type I error rate and allowing sufficient power to detect diff...
Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes.
A framework for significance analysis of gene expression data using dimension reduction methods
BMC Bioinformatics, 2007
The most popular methods for significance analysis on microarray data are well suited to find genes differentially expressed across predefined categories. However, identification of features that correlate with continuous dependent variables is more difficult using these methods, and long lists of significant genes returned are not easily probed for co-regulations and dependencies. Dimension reduction methods are much used in the microarray literature for classification or for obtaining low-dimensional representations of data sets. These methods have an additional interpretation strength that is often not fully exploited when expression data are analysed. In addition, significance analysis may be performed directly on the model parameters to find genes that are important for any number of categorical or continuous responses. We introduce a general scheme for analysis of expression data that combines significance testing with the interpretative advantages of the dimension reduction methods. This approach is applicable both for explorative analysis and for classification and regression problems.
Robust Local Normalization Of Gene Expression Microarray Data
2000
analysis of expression microarray data is normalization: the process of adjusting the signal from 2 different reporter channels on a single microarray or a single-reporter channel on multiple microarrays to a common scale. Current methods involve either normalization to some representative statistic of all of the data or to a statistic of some subset of the data, such as a set of "housekeeping" genes. The first method fails to correct any non-linearity between the data channels; the second method is sometimes undone by differential expression of genes that were thought to be unregulated. Agilent has invented a normalization method that uses robust statistical methods to establish the "central tendency" of a set of differential expression data. Normalization utilizes the data clustered near this central tendency; these points comprise an experimentally determined set of housekeeping genes. The resulting algorithm is rapid, robust and capable of correctly normalizing microarray data from different platforms, such as cDNA and in situ synthesized oligonucleotide microarrays. In addition, the method provides an easily interpreted measurement of the degree to which the normalization has altered the original data.
Microarray probe expression measures, data normalization and statistical validation
Comparative and functional genomics, 2003
DNA microarray technology is a high-throughput method for gaining information on gene function. Microarray technology is based on deposition/synthesis, in an ordered manner, on a solid surface, of thousands of EST sequences/genes/oligonucleotides. Due to the high number of generated datapoints, computational tools are essential in microarray data analysis and mining to grasp knowledge from experimental results. In this review, we will focus on some of the methodologies actually available to define gene expression intensity measures, microarray data normalization, and statistical validation of differential expression.
Bioinformatics and biology insights, 2009
Microarray technology has become highly valuable for identifying complex global changes in gene expression patterns. The assignment of functional information to these complex patterns remains a challenging task in effectively interpreting data and correlating results from across experiments, projects and laboratories. Methods which allow the rapid and robust evaluation of multiple functional hypotheses increase the power of individual researchers to data mine gene expression data more efficiently. We have developed (gene set matrix analysis) GSMA as a useful method for the rapid testing of group-wise up- or down-regulation of gene expression simultaneously for multiple lists of genes (gene sets) against entire distributions of gene expression changes (datasets) for single or multiple experiments. The utility of GSMA lies in its flexibility to rapidly poll gene sets related by known biological function or as designated solely by the end-user against large numbers of datasets simultan...