Inferring Gene Ontology category membership via cross-experiment gene expression and sequence similarity data analy-sis (original) (raw)

Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships

2004

The Gene Ontology and annotations derived from the S. cerivisiae Genome Database were analyzed to calculate functional similarity of gene products. Three methods for measuring similarity (including a distance-based approach) were implemented. Significant, quantitative relationships between similarity and expression correlation of pairs of genes were detected. Using a known gene expression dataset in yeast, this study compared more than three million pairs of gene products on the basis of these functional properties. Highly correlated genes exhibit strong similarity based on information originating from the gene ontology taxonomies. Such a similarity is significantly stronger than that observed between weakly correlated genes. This study supports the feasibility of applying gene ontology-driven similarity methods to functional prediction tasks, such as the validation of gene expression analyses and the identification of false positives in protein interaction studies.

Correlation of Genes Similarity Measures Based on GO Terms Similarity and Gene Expression Values

Advances in Intelligent and Soft Computing, 2011

In this paper we present results of analysis if (and how) the functional similarity of genes can be compared to the similarity resulting from raw experimental data. We assume that information provided by Gene Ontology database can be regarded as an expert knowledge on genes and their function and therefore it should be correlated with genes similarity obtained based on analysis of raw expression data. We analyse several different measures of genes similarities in the Gene Ontology (GO) domain and compare the obtained results with the genes similarities observed in the expression level domain. We perform the analysis on three datasets on different characteristics. We shows that there is no single measure which gives the best results in all cases, and the choice of appropriate gene similarity measure depends on sets characteristics. In most cases, the best results are obtained by Avg-sum gene similarity measure in combination with Path-length GO terms similarity measure. 1

Effective similarity measures for expression profiles

Bioinformatics, 2006

It is commonly accepted that genes with similar expression profiles are functionally related. However, there are many ways one can measure the similarity of expression profiles, and it is not clear apriori what is the most effective one. Moreover, so far no clear distinction has been made as for the type of the functional link between genes as suggested by microarray data. Similarly expressed genes can be part of the same complex as interacting partners; they can participate in the same pathway without interacting directly; they can perform similar functions; or they can simply have similar regulatory sequences. Here we conduct a study of the notion of functional link as implied from expression data. We analyze different similarity measures of gene expression profiles and assess their usefulness and robustness in detecting biological relationships by comparing the similarity scores with results obtained from databases of interacting proteins, promoter signals, and cellular pathways, as well as through sequence comparisons. We also introduce variations on similarity measures that are based on statistical analysis and better discriminate genes which are functionally nearby and faraway. Our tools can be used to assess other similarity measures for expression profiles, and are accessible at biozon.org/tools/expression/.

An Efficient Measure of Similarity between Gene Expression Profiles through Data Transformations

2006

Background: Clustering methods have been widely applied to gene expression data in order to group genes sharing common or similar expression profiles into discrete functional groups. In such analyses, designing an appropriate (dis)similarity measure is critical. In this study, we aim to develop a new distance measure for gene expression profiles. The new measure is expected to be especially efficient when the shape of expression profile is vital in determining the gene relationship, yet the expression magnitude should also be accounted for to some extent. Results: The new measure, named TransChisq, was developed by separately modeling the shape and magnitude information and then using the estimated shape and magnitude parameters to define a distance measure in a new feature space. The feature space was constructed based on the specific clustering purpose of grouping genes with similar shape of expression curves, while the magnitude information should also be considered when determining the shape similarity. The new measure was employed into a k-means clustering procedure for performing clustering analyses. Results from applications to a simulation dataset, a developing mouse retina SAGE dataset, a small yeast sporulation cDNA dataset and a maize root affymetrix microarray dataset show the clear advantages of our method over others. Conclusions: The proposed method described in this paper shows great promise in capturing underlying biological relationship in gene expression profiles. This study also demonstrates that the construction of an appropriate feature space under certain clustering purpose is critical for a successful distance measure. We hope our method provides some new insights to further investigation in analyzing gene expression data. The clustering algorithms are available upon request.

Validation and functional annotation of expression-based clusters based on gene ontology

2006

The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. In particular, placing the results in the context of other available functional genomics data, such as existing bio-ontologies, has already provided substantial improvement for detecting and categorizing genes of interest. One common approach is to look for functional annotations that are significantly enriched within a group or cluster of genes, as compared to a reference group.

Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data

BMC Bioinformatics, 2004

The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size.

MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data

Genome biology, 2003

MAPPFinder is a tool that creates a global gene-expression profile across all areas of biology by integrating the annotations of the Gene Ontology (GO) Project with the free software package GenMAPP http://www.GenMAPP.org. The results are displayed in a searchable browser, allowing the user to rapidly identify GO terms with over-represented numbers of gene-expression changes. Clicking on GO terms generates GenMAPP graphical files where gene relationships can be explored, annotated, and files can be freely exchanged.

Comparison Analysis of Gene Expression Profiles Proximity Metrics

Symmetry

The problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of the effectiveness of the most used metrics to estimate the gene expression profiles’ proximity, which can be used to extract the groups of informative gene expression profiles while taking into account the states of the investigated samples. Symmetry is very important in the field of both genes’ and/or proteins’ interaction since it undergirds essentially all interactions between molecular components in the GRN and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of GRN in terms of both the symmetry and understanding the mechanism of molecular element intera...