Finding communities in large datasets (original) (raw)
When using larger datasets of tree-ring series, calculating the table with similarities can take a lot of time, but finding communities even more. It is therefore recommended to use of parallel computing for Clique Percolation:clique_community_names_par(network, k=3, n_core = 4)
. This reduces the amount of time significantly. For most datasets[clique_community_names()](../reference/clique%5Fcommunity%5Fnames.html)
is sufficiently fast and for smaller datasets [clique_community_names_par()](../reference/clique%5Fcommunity%5Fnames%5Fpar.html)
can even be slower due to the parallelisation. Therefore, the funtion[clique_community_names()](../reference/clique%5Fcommunity%5Fnames.html)
should be used initially and if this is very slow, start using[clique_community_names_par()](../reference/clique%5Fcommunity%5Fnames%5Fpar.html)
.
The workflow is similar as described in the[vignette("dendroNetwork")](../articles/dendroNetwork.html)
, but with minor changes:
- load network.
- compute similarities.
- find the maximum clique size:
igraph::clique_num(network)
. - detect communities for each clique size separately:
com_cpm_k3 <- clique_community_names_par(network, k=3, n_core = 6)
.com_cpm_k4 <- clique_community_names_par(network, k=4, n_core = 6)
.- and so on until the maximum clique size.
- merge these into a single
data frame
bycom_cpm_all <- rbind(com_cpm_k3,com_cpm_k4, com_cpm_k5,... )
. - create table for use in cytoscape with all communities:
com_cpm_all <- com_cpm_all |> dplyr::count(node, com_name) |> tidyr::spread(com_name, n)
. - Continue with the visualisation in Cytoscape, see the relevant section in the
[vignette("dendroNetwork")](../articles/dendroNetwork.html)
.