Get Sample Groups from Signature Decomposition Information — get_groups (original) (raw)
One of key results from signature analysis is to cluster samples into different groups. This function takes Signature
object as input and return the membership in each cluster.
get_groups(
Signature,
method = c("consensus", "k-means", "exposure", "samples"),
n_cluster = NULL,
match_consensus = TRUE
)
Arguments
a Signature
object obtained either from sig_extract or sig_auto_extract. Now it can be used to relative exposure result in data.table
format from sig_fit.
grouping method, more see details, could be one of the following:
- 'consensus' - returns the cluster membership based on the hierarchical clustering of the consensus matrix, it can only be used for the result obtained by
[sig_extract()](sig%5Fextract.html)
with multiple runs using NMF package. - 'k-means' - returns the clusters by k-means.
- 'exposure' - assigns a sample into a group whose signature exposure is dominant.
- 'samples' - returns the cluster membership based on the contribution of signature to each sample, it can only be used for the result obtained by
[sig_extract()](sig%5Fextract.html)
using NMF package.
only used when the method
is 'k-means'.
only used when the method
is 'consensus'. If TRUE
, the result will match order as shown in consensus map.
Value
a data.table
object
Details
Users may find there are bigger differences between using method 'samples' and 'exposure' but they use a similar idear to find dominant signature, here goes the reason:
Method 'samples' using data directly from NMF decomposition, this means the two matrixW
(basis matrix or signature matrix) and H
(coefficient matrix or exposure matrix) are the results of NMF. For method 'exposure', it uses the signature exposure loading matrix. In this situation, each signture represents a number of mutations (alterations) about implementation please see source code of [sig_extract()](sig%5Fextract.html)
function.
See also
Examples
# \donttest{
# Load copy number prepare object
load(system.file("extdata", "toy_copynumber_tally_W.RData",
package = "sigminer", mustWork = TRUE
))
# Extract copy number signatures
library(NMF)
#> Loading required package: registry
#> Loading required package: rngtools
#> Loading required package: cluster
#> NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 2/2
#> To enable shared memory capabilities, try: install.extras('
#> NMF
#> ')
sig <- sig_extract(cn_tally_W$nmf_matrix, 2,
nrun = 10
)
#> NMF algorithm: 'brunet'
#> Multiple runs: 10
#> Mode: sequential [foreach:doParallelMC]
#>
Runs: |
Runs: | | 0%
Runs: |
Runs: |===== | 9%
Runs: |
Runs: |========= | 18%
Runs: |
Runs: |============== | 27%
Runs: |
Runs: |================== | 36%
Runs: |
Runs: |======================= | 45%
Runs: |
Runs: |=========================== | 55%
Runs: |
Runs: |================================ | 64%
Runs: |
Runs: |==================================== | 73%
Runs: |
Runs: |========================================= | 82%
Runs: |
Runs: |============================================= | 91%
Runs: |
Runs: |==================================================| 100%
#> System time:
#> user system elapsed
#> 4.491 0.000 4.490
# Methods 'consensus' and 'samples' are from NMF::predict()
g1 <- get_groups(sig, method = "consensus", match_consensus = TRUE)
#> ℹ [2024-08-04 14:38:58.265139]: Started.
#> ✔ [2024-08-04 14:38:58.26683]: 'Signature' object detected.
#> ℹ [2024-08-04 14:38:58.268319]: Obtaining clusters from the hierarchical clustering of the consensus matrix...
#> ℹ [2024-08-04 14:38:58.285673]: Finding the dominant signature of each group...
#> => Generating a table of group and dominant signature:
#>
#> Sig1 Sig2
#> 1 0 2
#> 2 8 0
#> => Assigning a group to a signature with the maxium fraction (stored in 'map_table' attr)...
#> ℹ [2024-08-04 14:38:58.30013]: Summarizing...
#> group #1: 2 samples with Sig2 enriched.
#> group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.302022]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> ℹ [2024-08-04 14:38:58.303385]: 0.038 secs elapsed.
g1
#> sample group silhouette_width enrich_sig
#> <char> <char> <num> <char>
#> 1: TCGA-05-4417-01A-22D-1854-01 1 1.000 Sig2
#> 2: TCGA-99-7458-01A-11D-2035-01 1 0.986 Sig2
#> 3: TCGA-CV-7432-01A-11D-2128-01 2 0.986 Sig1
#> 4: TCGA-DF-A2KN-01A-11D-A17U-01 2 0.986 Sig1
#> 5: TCGA-B6-A0X5-01A-21D-A107-01 2 0.986 Sig1
#> 6: TCGA-A8-A07S-01A-11D-A036-01 2 0.986 Sig1
#> 7: TCGA-A5-A0G2-01A-11D-A042-01 2 0.986 Sig1
#> 8: TCGA-26-6174-01A-21D-1842-01 2 0.986 Sig1
#> 9: TCGA-06-0644-01A-02D-0310-01 2 1.000 Sig1
#> 10: TCGA-19-2621-01B-01D-0911-01 2 0.889 Sig1
g2 <- get_groups(sig, method = "samples")
#> ℹ [2024-08-04 14:38:58.307234]: Started.
#> ✔ [2024-08-04 14:38:58.308626]: 'Signature' object detected.
#> ℹ [2024-08-04 14:38:58.309989]: Obtaining clusters by the contribution of signature to each sample...
#> ℹ [2024-08-04 14:38:58.312675]: Finding the dominant signature of each group...
#> => Generating a table of group and dominant signature:
#>
#> Sig1 Sig2
#> 1 0 2
#> 2 8 0
#> => Assigning a group to a signature with the maxium fraction (stored in 'map_table' attr)...
#> ℹ [2024-08-04 14:38:58.326211]: Summarizing...
#> group #1: 2 samples with Sig2 enriched.
#> group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.328055]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> ℹ [2024-08-04 14:38:58.329458]: 0.022 secs elapsed.
g2
#> sample group silhouette_width prob enrich_sig
#> <char> <char> <num> <num> <char>
#> 1: TCGA-05-4417-01A-22D-1854-01 1 1 1.000 Sig2
#> 2: TCGA-06-0644-01A-02D-0310-01 2 1 0.787 Sig1
#> 3: TCGA-19-2621-01B-01D-0911-01 2 1 1.000 Sig1
#> 4: TCGA-26-6174-01A-21D-1842-01 2 1 1.000 Sig1
#> 5: TCGA-99-7458-01A-11D-2035-01 1 1 0.679 Sig2
#> 6: TCGA-A5-A0G2-01A-11D-A042-01 2 1 0.598 Sig1
#> 7: TCGA-A8-A07S-01A-11D-A036-01 2 1 0.975 Sig1
#> 8: TCGA-B6-A0X5-01A-21D-A107-01 2 1 1.000 Sig1
#> 9: TCGA-CV-7432-01A-11D-2128-01 2 1 0.544 Sig1
#> 10: TCGA-DF-A2KN-01A-11D-A17U-01 2 1 1.000 Sig1
# Use k-means clustering
g3 <- get_groups(sig, method = "k-means")
#> ℹ [2024-08-04 14:38:58.333421]: Started.
#> ✔ [2024-08-04 14:38:58.334835]: 'Signature' object detected.
#> ℹ [2024-08-04 14:38:58.338967]: Running k-means with 2 clusters...
#> ℹ [2024-08-04 14:38:58.34221]: Generating a table of group and signature contribution (stored in 'map_table' attr):
#> Sig1 Sig2
#> 1 0.2097559 0.7901116
#> 2 0.8964984 0.1035016
#> ℹ [2024-08-04 14:38:58.34429]: Assigning a group to a signature with the maximum fraction...
#> ℹ [2024-08-04 14:38:58.34854]: Summarizing...
#> group #1: 2 samples with Sig2 enriched.
#> group #2: 8 samples with Sig1 enriched.
#> ! [2024-08-04 14:38:58.350393]: The 'enrich_sig' column is set to dominant signature in one group, please check and make it consistent with biological meaning (correct it by hand if necessary).
#> ℹ [2024-08-04 14:38:58.351754]: 0.018 secs elapsed.
g3
#> Key: <group>
#> sample group silhouette_width enrich_sig
#> <char> <char> <num> <char>
#> 1: TCGA-05-4417-01A-22D-1854-01 1 0.532 Sig2
#> 2: TCGA-99-7458-01A-11D-2035-01 1 0.121 Sig2
#> 3: TCGA-06-0644-01A-02D-0310-01 2 0.755 Sig1
#> 4: TCGA-19-2621-01B-01D-0911-01 2 0.850 Sig1
#> 5: TCGA-26-6174-01A-21D-1842-01 2 0.850 Sig1
#> 6: TCGA-A5-A0G2-01A-11D-A042-01 2 0.493 Sig1
#> 7: TCGA-A8-A07S-01A-11D-A036-01 2 0.847 Sig1
#> 8: TCGA-B6-A0X5-01A-21D-A107-01 2 0.850 Sig1
#> 9: TCGA-CV-7432-01A-11D-2128-01 2 0.341 Sig1
#> 10: TCGA-DF-A2KN-01A-11D-A17U-01 2 0.850 Sig1
# }