A Simple Example (original) (raw)
Download a copy of the vignette to follow along here: a_simple_example.Rmd
In this vignette, we will show how metasnf
can be used for a very simple SNF workflow.
This simple workflow is the example of SNF provided in the original_SNFtool_ package. You can find the example by loading the_SNFtool_ package and then viewing the documentation for the main SNF function by running ?SNF
.
The original SNF example
1. Load the package
2. Set SNF hyperparameters
Three hyperparameters are introduced in this example: K,alpha (also referred to as sigma or eta in different documentations), and T. You can learn more about the significance of these hyperparameters in the original SNF paper (see references).
K <- 20
alpha <- 0.5
T <- 20
3. Load the data
The SNFtool package provides two mock data frames titled_Data1_ and Data2 for this example. _Data1_contains gene expression values of two genes for 200 patients.Data2 similarly contains methylation data for two genes for those same 200 patients.
Here’s what the mock data looks like:
library(ComplexHeatmap)
# gene expression data
gene_expression_hm <- Heatmap(
as.matrix(Data1),
cluster_rows = FALSE,
cluster_columns = FALSE,
show_row_names = FALSE,
show_column_names = FALSE,
heatmap_legend_param = list(
title = "Gene Expression"
)
)
gene_expression_hm
Heatmap of gene expression values.
# methylation data
methylation_hm <- Heatmap(
as.matrix(Data2),
cluster_rows = FALSE,
cluster_columns = FALSE,
show_row_names = FALSE,
show_column_names = FALSE,
heatmap_legend_param = list(
title = "Methylation"
)
)
methylation_hm
Heatmap of methylation values.
The “ground truth” of how this data was generated was that patients 1 to 100 were drawn from one distribution and patients 101 to 200 were drawn from another. We don’t have access to that kind of knowledge in real data, but we do here.
true_label <- c(matrix(1, 100, 1), matrix(2, 100, 1))
4. Generate similarity matrices for each data source
We consider the two gene expression features in Data1 to contain information from one broader gene expression source and the two methylation features in Data2 to contain information from a broader methylation source.
The next step is to determine, for each of the sources we have, how similar all of our patients are to each other.
This is done by first determining how dissimilar the patients are to each other for each source, and then converting that dissimilarity information into similarity information.
To calculate dissimilarity, we’ll use Euclidean distance.
distance_matrix_1 <- as.matrix(dist(Data1, method = "euclidean"))
distance_matrix_2 <- as.matrix(dist(Data2, method = "euclidean"))
Then, we can use the affinityMatrix
function provided by_SNFtool_ to convert those distance matrices into similarity matrices.
similarity_matrix_1 <- affinityMatrix(distance_matrix_1, K, alpha)
similarity_matrix_2 <- affinityMatrix(distance_matrix_2, K, alpha)
Those similarity matrices can be passed into the SNF
function to integrate them into a single similarity matrix that describes how similar the patients are to each other across both the gene expression and methylation data.
5. Integrate similarity matrices with SNF
fused_network <- SNF(list(similarity_matrix_1, similarity_matrix_2), K, T)
6. Find clusters in the integrated matrix
If we think there are 2 clusters in the data, we can use spectral clustering to find 2 clusters in the fused network.
number_of_clusters <- 2
assigned_clusters <- spectralClustering(fused_network, number_of_clusters)
Sure enough, we are able to obtain the correct cluster label for all patients.
all(true_label == assigned_clusters)
#> [1] TRUE
References
Wang, Bo, Aziz M. Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains, and Anna Goldenberg. 2014. “Similarity Network Fusion for Aggregating Data Types on a Genomic Scale.” Nature Methods 11 (3): 333–37. https://doi.org/10.1038/nmeth.2810.