Rank-based transcriptional signatures (original) (raw)

A novel method for generation of signature networks as biomarkers from complex high throughput data

Toxicology Letters, 2005

Traditionally, gene signatures are statistically deduced from large gene expression and proteomics datasets and have been applied as an experimental molecular diagnostic technique that is sensitive to experimental design and statistical treatment. We have developed and applied the approach of "signature networks" which overcomes some of the drawbacks of clustering methods. We have demonstrated signature network assembly, functional analysis and logical operations on the networks that can be generated. In addition, we have used this technique in a proof of concept study to compare the effect of differential drug treatment using 4-hydroxytamoxifen and estrogen on the MCF-7 breast cancer cell line from a previously published study. We have shown that the two compounds can be differentiated by the networks of interacting genes. Both networks consist of a core module of genes including c-Fos as part of c-Fos/c-Jun heterodimer and c-Myc which is clearly visible. Using algorithms in our MetaCore TM software we are able to subtract the 4-hydroxytamoxifen and estrogen networks to further understand differences between these two treatments and show that the estrogen network is assembled around the core with other modules essential for all phases of the cell cycle. For example, Cyclin D1 is present in networks for the estrogen treated cells from two separate studies. These signature networks represent an approach to identify biomarkers and a general approach for discovering new relationships in complex high throughput toxicology data.

Iterative signature algorithm for the analysis of large-scale gene expression data

Physical Review E, 2003

We present an approach for the analysis of genome-wide expression data. Our method is designed to overcome the limitations of traditional techniques, when applied to large-scale data. Rather than alloting each gene to a single cluster, we assign both genes and conditions to context-dependent and potentially overlapping transcription modules. We provide a rigorous definition of a transcription module as the object to be retrieved from the expression data. An efficient algorithm, which searches for the modules encoded in the data by iteratively refining sets of genes and conditions until they match this definition, is established. Each iteration involves a linear map, induced by the normalized expression matrix, followed by the application of a threshold function. We argue that our method is in fact a generalization of singular value decomposition, which corresponds to the special case where no threshold is applied. We show analytically that for noisy expression data our approach leads to better classification due to the implementation of the threshold. This result is confirmed by numerical analyses based on in silico expression data. We discuss briefly results obtained by applying our algorithm to expression data from the yeast Saccharomyces cerevisiae.

Classification Of Gene Signatures For Their Information Value And Functional Redundancy

Large collections of gene signatures play a pivotal role in interpreting results of omics data analysis but suffer from compositional (large overlap) and functional (redundant read-outs) redundancy, and many gene signatures rarely pop-up in statistical tests. Based on pan-cancer data analysis, here we define a restricted set of 962 so called informative signatures and demonstrate that they have more chances to appear highly enriched in cancer biology studies. We show that the majority of informative signatures conserve their weights for the composing genes (eigengenes) from one cancer type to another. We construct InfoSigMap, an interactive online map showing the structure of compositional and functional redundancies between informative signatures and charting the territories of biological functions accessible through transcriptomic studies. InfoSigMap can be used to visualize in one insightful picture the results of comparative omics data analyses and suggests reconsidering existin...

A feature selection approach for identification of signature genes from SAGE data

BMC Bioinformatics, 2007

Background: One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements.

Gene-signatures for early detection of cancers

bioRxiv (Cold Spring Harbor Laboratory), 2023

Background: Gene signatures represents set of molecular modulations in disease genomes or in cells at specific conditions, and are frequently used to classify samples into different groups for better research or clinical treatment. Multiple methods and applications are available in the literature, but powerful ones that can account for early detection of cancer are still lacking. Method: In this article, gene-signatures identified through new in-house algorithm (NCT method) by processing transcriptome data (DEGs extracted from RNA-seq dataset) from population. NCT-Method utilized for processing population dataset, from 28 different human cancer from TCGA & GTEx databases, as empirical background. NCTscore used for optimal clustering of gene-set. The identified gene clusters evaluated through survival analysis. Gene-sets with disease-vs-normal survival plot with logPvalue < 0.05 represented as reliable gene signatures. Results: We applied NCT algorithm to the 28 different cancers, and identified novel genesignatures as well as interrelation between different cancers in reference of identified signatures. Conclusions: The algorithm uses population data, and provides validated gene signatures with reliable capacity to discriminate the cancer and normal samples with higher classification performance. The algorithm will be useful to find signature for any RNA-seq data.

Finding structured gene signatures

2008 IEEE International Conference on Bioinformatics and Biomeidcine Workshops, 2008

In the context of gene signature identification from microarray data, a main problem is devising statistical and visual tools to interpret and understand the biological meaning of the selected genes. Most available statistical tools for gene signature extraction typically provide unstructured list of genes and lack the capability of handling correlation among genes. Recently an algorithm for feature selection, namely elastic net, was proposed allowing to deal with correlated genes in a transparent way. In this work we exploit the form of the output given by elastic net, as used in , to obtain a structured gene signature where genes are disposed in block of intra-correlated genes and the blocks are ranked according to a measure of the block discrminative power. After recalling how elastic net can be used to define nested lists of increasingly intra-correlated genes, we propose an ad hoc agglomerative clustering technique able to refine such a nested output by explicitly identifying modules of correlated genes. We take advantage of such a structure to visualize the correlation patterns underlying the data. The proposed procedure is validated on both synthetic data and applied to real gene expression datasets.

A Sample Selection Strategy to Boost the Statistical Power of Signature Detection in Cancer Expression Profile Studies

Anti-Cancer Agents in Medicinal Chemistry, 2013

In case-control profiling studies, increasing the sample size does not always improve statistical power because the variance may also be increased if samples are highly heterogeneous. For instance, tumor samples used for gene expression assay are often heterogeneous in terms of tissue composition or mechanism of progression, or both; however, such variation is rarely taken into account in expression profiles analysis. We use a prostate cancer prognosis study as an example to demonstrate that solely recruiting more patient samples may not increase power for biomarker detection at all. In response to the heterogeneity due to mixed tissue, we developed a sample selection strategy termed Stepwise Enrichment by which samples are systematically culled based on tumor content and analyzed with t-test to determine an optimal threshold for tissue percentage. The selected tissue-percentage threshold identified the most significant data by balancing the sample size and the sample homogeneity; therefore, the power is substantially increased for identifying the prognostic biomarkers in prostate tumor epithelium cells as well as in prostate stroma cells. This strategy can be generally applied to profiling studies where the level of sample heterogeneity can be measured or estimated.

Computing Molecular Signatures as Optima of a Bi-Objective Function: Method and Application to Prediction in Oncogenomics

Cancer Informatics, 2015

bAckground: Filter feature selection methods compute molecular signatures by selecting subsets of genes in the ranking of a valuation function. The motivations of the valuation functions choice are almost always clearly stated, but those for selecting the genes according to their ranking are hardly ever explicit. Method: We addressed the computation of molecular signatures by searching the optima of a bi-objective function whose solution space was the set of all possible molecular signatures, ie, the set of subsets of genes. The two objectives were the size of the signature-to be minimized-and the interclass distance induced by the signature-to be maximized-. results: We showed that: 1) the convex combination of the two objectives had exactly n optimal non empty signatures where n was the number of genes, 2) the n optimal signatures were nested, and 3) the optimal signature of size k was the subset of k top ranked genes that contributed the most to the interclass distance. We applied our feature selection method on five public datasets in oncology, and assessed the prediction performances of the optimal signatures as input to the diagonal linear discriminant analysis (DLDA) classifier. They were at the same level or better than the best-reported ones. The predictions were robust, and the signatures were almost always significantly smaller. We studied in more details the performances of our predictive modeling on two breast cancer datasets to predict the response to a preoperative chemotherapy: the performances were higher than the previously reported ones, the signatures were three times smaller (11 versus 30 gene signatures), and the genes member of the signature were known to be involved in the response to chemotherapy. conclusIons: Defining molecular signatures as the optima of a bi-objective function that combined the signature size and the interclass distance was well founded and efficient for prediction in oncogenomics. The complexity of the computation was very low because the optimal signatures were the sets of genes in the ranking of their valuation. Software can be freely downloaded from http://gardeux-vincent.eu/DeltaRanking.php keywords: molecular signatures, bi-objective optimization, filter method, feature selection, breast cancer CitAtiOn: Gardeux et al. Computing molecular signatures as optima of a Bi-objective function: method and application

An algorithm to discover gene signatures with predictive potential

Journal of Experimental & Clinical Cancer Research, 2010

Background: The advent of global gene expression profiling has generated unprecedented insight into our molecular understanding of cancer, including breast cancer. For example, human breast cancer patients display significant diversity in terms of their survival, recurrence, metastasis as well as response to treatment. These patient outcomes can be predicted by the transcriptional programs of their individual breast tumors. Predictive gene signatures allow us to correctly classify human breast tumors into various risk groups as well as to more accurately target therapy to ensure more durable cancer treatment. Results: Here we present a novel algorithm to generate gene signatures with predictive potential. The method first classifies the expression intensity for each gene as determined by global gene expression profiling as low, average or high. The matrix containing the classified data for each gene is then used to score the expression of each gene based its individual ability to predict the patient characteristic of interest. Finally, all examined genes are ranked based on their predictive ability and the most highly ranked genes are included in the master gene signature, which is then ready for use as a predictor. This method was used to accurately predict the survival outcomes in a cohort of human breast cancer patients. Conclusions: We confirmed the capacity of our algorithm to generate gene signatures with bona fide predictive ability. The simplicity of our algorithm will enable biological researchers to quickly generate valuable gene signatures without specialized software or extensive bioinformatics training.