Finding communities in large datasets (original) (raw)

When using larger datasets of tree-ring series, calculating the table with similarities can take a lot of time, but finding communities even more. It is therefore recommended to use of parallel computing for Clique Percolation:clique_community_names_par(network, k=3, n_core = 4). This reduces the amount of time significantly. For most datasets[clique_community_names()](../reference/clique%5Fcommunity%5Fnames.html) is sufficiently fast and for smaller datasets [clique_community_names_par()](../reference/clique%5Fcommunity%5Fnames%5Fpar.html) can even be slower due to the parallelisation. Therefore, the funtion[clique_community_names()](../reference/clique%5Fcommunity%5Fnames.html) should be used initially and if this is very slow, start using[clique_community_names_par()](../reference/clique%5Fcommunity%5Fnames%5Fpar.html).

The workflow is similar as described in the[vignette("dendroNetwork")](../articles/dendroNetwork.html), but with minor changes:

  1. load network.
  2. compute similarities.
  3. find the maximum clique size:igraph::clique_num(network) .
  4. detect communities for each clique size separately:
    • com_cpm_k3 <- clique_community_names_par(network, k=3, n_core = 6).
    • com_cpm_k4 <- clique_community_names_par(network, k=4, n_core = 6).
    • and so on until the maximum clique size.
  5. merge these into a single data frame bycom_cpm_all <- rbind(com_cpm_k3,com_cpm_k4, com_cpm_k5,... ).
  6. create table for use in cytoscape with all communities:com_cpm_all <- com_cpm_all |> dplyr::count(node, com_name) |> tidyr::spread(com_name, n).
  7. Continue with the visualisation in Cytoscape, see the relevant section in the [vignette("dendroNetwork")](../articles/dendroNetwork.html) .