Identification and Analysis of microRNA-Disease Associations with Kernelized Bayesian Matrix Factorization (original) (raw)
Related papers
IRJET- Different Approaches for Finding Micro RNA and Disease Association
IRJET, 2020
MicroRNAs are a sort of non coding RNAs with approximately 22nt nucleotides. Increasing evidences have proven that miRNAs play important roles in lots of human sicknesses. The identity of human disease related miRNAs is beneficial to discover the underlying pathogenesis of diseases. More and greater experimental proven associations among miRNAs and diseases were stated within side the latest studies, which provide beneficial records for new miRNA-disease association discovery. In this work, a computational framework, KBMFMDI, to expect the associations among miRNAs and diseases based on their similarities. The collection and characteristic records of miRNAs are used to degree similarity amongst miRNAs whilst the semantic and function records of disease are used to degree similarity amongst diseases, respectively. In addition, the kernalized Bayesian matrix factorization approach and self organizing maps is hired to infer potential miRNA disease associations via way of means of integrating those data sources.
MicroRNAs (miRNAs) are significant regulators in various biological processes. They may become promising biomarkers or therapeutic targets, which provide a new perspective in diagnosis and treatment of multiple diseases. Since the experimental methods are always costly and resource-consuming, prediction of diseaserelated miRNAs using computational methods is in great need. In this study, we developed MDA-CF to identify underlying miRNA-disease associations based on a cascade forest model. In this method, multi-source information was integrated to represent miRNAs and diseases comprehensively, and the autoencoder was utilized for dimension reduction to obtain the optimal feature space. The cascade forest model was then employed for miRNA-disease association prediction. As a result, the average AUC of MDA-CF was 0.9464 on HMDD v3.2 in five-fold cross-validation. Compared with previous computational methods, MDA-CF performed better on HMDD v2.0 with an average AUC of 0.9258. Moreover, MDA-CF was implemented to investigate colon neoplasm, breast neoplasm, and gastric neoplasm, and 100%, 86%, 88% of the top 50 potential miRNAs were validated by authoritative databases. In conclusion, MDA-CF appears to be a reliable method to uncover disease-associated miRNAs. The source code of MDA-CF is available at https://github.com/a1622108/MDA-CF.
Scientific Reports, 2020
In recent years, accumulating evidences have shown that microRNA (miRNA) plays an important role in the exploration and treatment of diseases, so detection of the associations between miRNA and disease has been drawn more and more attentions. However, traditional experimental methods have the limitations of high cost and time-consuming, a computational method can help us more systematically and effectively predict the potential miRNA-disease associations. In this work, we proposed a novel network embedding-based heterogeneous information integration method to predict miRNA-disease associations. More specifically, a heterogeneous information network is constructed by combining the known associations among lncRNA, drug, protein, disease, and miRNA. After that, the network embedding method Learning Graph Representations with Global Structural Information (GraRep) is employed to learn embeddings of nodes in heterogeneous information network. In this way, the embedding representations of miRnA and disease are integrated with the attribute information of miRNA and disease (e.g. miRNA sequence information and disease semantic similarity) to represent miRNA-disease association pairs. Finally, the Random Forest (RF) classifier is used for predicting potential miRNA-disease associations. Under the 5-fold cross validation, our method obtained 85.11% prediction accuracy with 80.41% sensitivity at the AUC of 91.25%. In addition, in case studies of three major Human diseases, 45 (Colon Neoplasms), 42 (Breast Neoplasms) and 44 (Esophageal Neoplasms) of top-50 predicted miRNAs are respectively verified by other miRNA-disease association databases. In conclusion, the experimental results suggest that our method can be a powerful and useful tool for predicting potential miRnA-disease associations.
Prediction of miRNA-disease associations with a vector space model
Scientific Reports, 2016
MicroRNAs play critical roles in many physiological processes. Their dysregulations are also closely related to the development and progression of various human diseases, including cancer. Therefore, identifying new microRNAs that are associated with diseases contributes to a better understanding of pathogenicity mechanisms. MicroRNAs also represent a tremendous opportunity in biotechnology for early diagnosis. To date, several in silico methods have been developed to address the issue of microRNA-disease association prediction. However, these methods have various limitations. In this study, we investigate the hypothesis that information attached to miRNAs and diseases can be revealed by distributional semantics. Our basic approach is to represent distributional information on miRNAs and diseases in a high-dimensional vector space and to define associations between miRNAs and diseases in terms of their vector similarity. Cross validations performed on a dataset of known miRNA-disease associations demonstrate the excellent performance of our method. Moreover, the case study focused on breast cancer confirms the ability of our method to discover new disease-miRNA associations and to identify putative false associations reported in databases.
A consistent evaluation of miRNA-disease association prediction models
2020
MotivationA variety of machine learning based approaches have been applied to predicting miRNA-disease association. Although promising, the evaluation set up to measure prediction performance is inconsistent making it difficult to assess the actual progress. A more acute problem is that most of the models overlook the problem of data leakage due to the use of precomputed miRNA and disease similarity features.ResultsWe unearth a crucial problem of data leakage in evaluation of machine learning models for miRNA-disease association prediction. In particular, information from test set, in the form of precomputed input features for miRNA and disease, is used during training of the model. Moreover, we point out problems in the widely used performance metrics used in model evaluation. While resolving the issues of data leakage and model evaluation, we perform an indepth study of 3 recent models along with our proposed 9 variants of these models. Our proposed variants have resulted in impro...
Prediction of microRNA-disease associations based on distance correlation set
BMC bioinformatics, 2018
Recently, numerous laboratory studies have indicated that many microRNAs (miRNAs) are involved in and associated with human diseases and can serve as potential biomarkers and drug targets. Therefore, developing effective computational models for the prediction of novel associations between diseases and miRNAs could be beneficial for achieving an understanding of disease mechanisms at the miRNA level and the interactions between diseases and miRNAs at the disease level. Thus far, only a few miRNA-disease association pairs are known, and models analyzing miRNA-disease associations based on lncRNA are limited. In this study, a new computational method based on a distance correlation set is developed to predict miRNA-disease associations (DCSMDA) by integrating known lncRNA-disease associations, known miRNA-lncRNA associations, disease semantic similarity, and various lncRNA and disease similarity measures. The novelty of DCSMDA is due to the construction of a miRNA-lncRNA-disease netwo...
Statistical analysis of a Bayesian classifier based on the expression of miRNAs
BMC Bioinformatics, 2015
Background: During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these topics are analysed in detail by discussing a model problem, i.e. the development of a Bayesian classifier that, on the basis of the expression of miR-205, miR-21 and snRNA U6, discriminates samples into two classes of pulmonary tumors: adenocarcinomas and squamous cell carcinomas. Results: We proved that the variance of miRNA expression triplicates is well described by a normal distribution and that triplicate averages also follow normal distributions. We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA. Conclusions: By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test. Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.
Bioinformatics, 2012
There have been many successful experimental and bioinformatics efforts to elucidate transcription factor (TF)target networks in several organisms. For many organisms, these annotations are complemented by miRNA-target networks of good quality. Attempts that use these networks in combination with gene expression data to draw conclusions on TF or miRNA activity are, however, still relatively sparse. Results: In this study, we propose Bayesian inference of regulation of transcriptional activity (BIRTA) as a novel approach to infer both, TF and miRNA activities, from combined miRNA and mRNA expression data in a condition specific way. That means our model explains mRNA and miRNA expression for a specific experimental condition by the activities of certain miRNAs and TFs, hence allowing for differentiating between switches from active to inactive (negative switch) and inactive to active (positive switch) forms. Extensive simulations of our model reveal its good prediction performance in comparison to other approaches. Furthermore, the utility of BIRTA is demonstrated at the example of Escherichia coli data comparing aerobic and anaerobic growth conditions, and by human expression data from pancreas and ovarian cancer. Availability and implementation: The method is implemented in the R package birta, which is freely available for Bioconductor (>= 2.10) on
Differential biomarkers detection from a genomic study poses big challenges for statistical analysis with a large number of markers and a small number of samples. Due to the presence of a large number of markers, Bayesian hierarchical approaches are not popular to analyze such data. But, the number of microRNAs in microRNA-microarray experiments is low, typically in hundreds, compared with a few thousands of genes measured in conventional gene expression profiling. This motivates us to introduce a Bayesian regression technique to analyze microRNA expression data. We incorporated the patient covariate information and the prior about the regression coefficients into the regression models and estimate the Area Under receiver operating characteristic curve (AUC) comparing two conditions. The Bayesian estimate of AUC and its variance information is used to develop a statistic for testing the AUC for each microRNA is equal to 0.5 allowing different variance for each microRNA. Our Bayesian regression approach provides a new inferential framework for such genomic data. We focus on the primary step of microRNA selection process, namely the ranking of microRNAs with respect to the test statistic to identify differential expression under two conditions. A dataset is analyzed to illustrate the method and a simulation study is carried out to assess the relative performance of different statistical measures. Simulation results suggest that, regarding identifying true positive differentially expressed microRNAs, the Bayesian technique performs better than linear regression model especially with small sample sizes and nonlinear scenarios.