GitHub - m1frasca/COSNet_GitHub (original) (raw)

COSNet: Cost Sensitive Network for node label prediction on graphs with highly unbalanced labelings

[overview] [download]

What is COSNet?

COSNet (COst Sensitive neural Network) [1,2] is a novel method to learn node labels biological networks with a high prevalence of negative instances. Examples of this context are the automated prediction of protein functions, the gene disease prioritization and the drug reposition problem.

COSNet is based on a cost-sensitive family of parametric Hopfield networks, whose characteristics can be summarized as follows:

Architecture

Framework

COSNet has three main modules: 1) Data processing. The package provides functions to partition input data in folds (find.division.strat and find.division.not.strat), and to generate temporary labels for the unlabelled instances (generate_labels), to be used in the learning phase. See Section 5.1.1 of reference [1] for details about this step; 2) Learning of parameters. This part of the package realizes the learning procedure of the COSNet algorithm (see Section 5.2 of [1]). The function generate_points projects nodes in the training set into labelled points in the plane (Section 5.2.1), whereas functions optimizep and optimize_pos_above learn the optimal straight line (Section 5.2.2); 3) Network dynamics and regularization. The package provides the function runSubnet to realize the dynamics (with the learned parameters) of the sub-network restricted to unlabelled nodes (Section 5.3 of [1]). Moreover, the function reg_data allows to simulate a regularized dynamics (Section 5.6 of paper [1]).

Installation

COSNet can be installed by running the R environment and typing:

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("COSNet")

Try http if https is not available. This will download and install the package. Another possibility is doing it manually. For instance, on a unix/linix R environment, download the package at http://bioconductor.org/packages/release/bioc/src/contrib/COSNet_1.4.1.tar.gz and save it in the current folder. Then from the R prompt type

install.packages("COSNet_1.4.1.tar.gz", repos=NULL)

The COSNet package has no dependencies. Nevertheless, the experiments reported in the package vignette use the R packages bionetdata and PerfMeas, available at the CRAN repository, for loading benchmark data and compute various measure of performance respectively.

The package PerfMeas in turn depends on the Bioconductor R packages limma, graph, and RBGL, which can be installed as described for the COSNet package.

To install the bionetdata package, type

install.packages("bionetdata")

or follow the instruction for the manual installation above.

Example

Here is a simple example. See this link, the manual and the package vignette for more examples.

library(bionetdata);

loading Binary protein-protein interactions from the STRING

data base (von Mering et al. 2002)

data(Yeast.STRING.data)# "Yeast.STRING.data"

FunCat classes annotations (0/1) for the genes included

in Yeast.STRING.data. Annotations refer the funcat-2.1

scheme, and funcat-2.1 data 20070316 data, available from the MIPS web site.

data(Yeast.STRING.FunCat) # "Yeast.STRING.FunCat" labels <- Yeast.STRING.FunCat; labels[labels == 0] <- -1;

excluding the dummy "00" root

labels <- labels[, -which(colnames(labels) == "00")]; n <- nrow(labels); k <- floor(n/10); cat("k = ", k, "\n");

choosing the first class

labeling <- labels[, 1];

randomly choosing a subset of genes whose labels are hidden

hidden <- sort(sample(1:n, k)); hidden.labels <- labeling[hidden]; labeling[hidden] <- 0; out <- COSNet(Yeast.STRING.data, labeling, 0); prediction <- out$pred; TP <- sum(hidden.labels == 1 & prediction == 1); FN <- sum(hidden.labels == 1 & prediction == -1); FP <- sum(hidden.labels == -1 & prediction == 1); out2 <- COSNet(Yeast.STRING.data, labeling, 0.0001); prediction <- out2$pred; TP2 <- sum(hidden.labels == 1 & prediction == 1); FN2 <- sum(hidden.labels == 1 & prediction == -1); FP2 <- sum(hidden.labels == -1 & prediction == 1);

Reference

[1] Frasca, M., Bertoni, A., Re, M. and Valentini, G. 
    "A neural network algorithm for semi-supervised node label learning from unbalanced data"
    Neural Networks, 43, 84-98, 2013.
[2] 
    Bertoni, A., Frasca, M., Valentini G.
    "COSNet: a Cost Sensitive Neural Network for Semi-supervised Learning in Graphs"
     In:Machine Learning and Knowledge Discovery in Databases.
      European Conference, ECML PKDD 2011, Athens, Greece, Proceedings, Part I,
      Lecture Notes in Artificial Intelligence, vol. 6911, pp.219-234, Springer, 2011.