A TCGA dataset application (original) (raw)
3.1. miRNA:target pairs
data("mirtarbasegene")
head(mirtarbasegene)
#> # A tibble: 6 × 2
#> miRNA Target
#> <chr> <chr>
#> 1 hsa-miR-20a-5p HIF1A
#> 2 hsa-miR-146a-5p CXCR4
#> 3 hsa-miR-122-5p CYP7A1
#> 4 hsa-miR-222-3p STAT5A
#> 5 hsa-miR-21-5p RASGRP1
#> 6 hsa-miR-148a-3p DNMT1
NOTE if the mirna:target dataset includes miRNA genes as targets, the priming_graph()
function can fail. Because, the function define to miRNAs and targets without distinguishing between uppercase or lowercase.
3.2. Gene expression in normal and tumor samples
The gene and mirna expression counts of patient barcoded with TCGA-E9-A1N5 is retrieved from TCGA via TCGAbiolinks
package (Colaprico et al. 2015) from Bioconductor
. The instructions of retrieving data can be found at TCGAbiolinks
manual.
For this step you don’t have to use TCGA data, any other source or package can be utilized.
data("TCGA_E9_A1N5_normal")
head(TCGA_E9_A1N5_normal)
#> # A tibble: 6 × 7
#> patient sample barcode definition ensembl_gene_id external_gene_name
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Solid Tis… ENSG00000000003 TSPAN6
#> 2 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Solid Tis… ENSG00000000005 TNMD
#> 3 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Solid Tis… ENSG00000000419 DPM1
#> 4 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Solid Tis… ENSG00000000457 SCYL3
#> 5 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Solid Tis… ENSG00000000460 C1orf112
#> 6 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Solid Tis… ENSG00000000938 FGR
#> # ℹ 1 more variable: gene_expression <dbl>
data("TCGA_E9_A1N5_tumor")
head(TCGA_E9_A1N5_tumor)
#> # A tibble: 6 × 7
#> patient sample barcode definition ensembl_gene_id external_gene_name
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Primary s… ENSG00000000003 TSPAN6
#> 2 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Primary s… ENSG00000000005 TNMD
#> 3 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Primary s… ENSG00000000419 DPM1
#> 4 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Primary s… ENSG00000000457 SCYL3
#> 5 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Primary s… ENSG00000000460 C1orf112
#> 6 TCGA-E9-A1N5 TCGA-E9-A1… TCGA-E… Primary s… ENSG00000000938 FGR
#> # ℹ 1 more variable: gene_expression <dbl>
3.3. miRNA expression data
data("TCGA_E9_A1N5_mirnatumor")
head(TCGA_E9_A1N5_mirnatumor)
#> # A tibble: 6 × 6
#> barcode mirbase_ID miRNA Precusor total_read total_RPM
#> <chr> <chr> <chr> <chr> <int> <dbl>
#> 1 TCGA-E9-A1N5-01A-11R-A14C-13 MIMAT0000062 hsa-l… MI00000… 45725 20802.
#> 2 TCGA-E9-A1N5-01A-11R-A14C-13 MIMAT0004481 hsa-l… MI00000… 100 45.5
#> 3 TCGA-E9-A1N5-01A-11R-A14C-13 MIMAT0010195 hsa-l… MI00000… 6 2.73
#> 4 TCGA-E9-A1N5-01A-11R-A14C-13 MIMAT0000063 hsa-l… MI00000… 43489 19785.
#> 5 TCGA-E9-A1N5-01A-11R-A14C-13 MIMAT0004482 hsa-l… MI00000… 126 57.3
#> 6 TCGA-E9-A1N5-01A-11R-A14C-13 MIMAT0000064 hsa-l… MI00000… 2002 911.
data("TCGA_E9_A1N5_mirnanormal")
head(TCGA_E9_A1N5_mirnanormal)
#> # A tibble: 6 × 6
#> barcode mirbase_ID miRNA Precusor total_read total_RPM
#> <chr> <chr> <chr> <chr> <int> <dbl>
#> 1 TCGA-E9-A1N5-11A-41R-A14C-13 MIMAT0000062 hsa-l… MI00000… 67599 37068.
#> 2 TCGA-E9-A1N5-11A-41R-A14C-13 MIMAT0004481 hsa-l… MI00000… 132 72.4
#> 3 TCGA-E9-A1N5-11A-41R-A14C-13 MIMAT0010195 hsa-l… MI00000… 57 31.3
#> 4 TCGA-E9-A1N5-11A-41R-A14C-13 MIMAT0000063 hsa-l… MI00000… 47266 25918.
#> 5 TCGA-E9-A1N5-11A-41R-A14C-13 MIMAT0004482 hsa-l… MI00000… 126 69.1
#> 6 TCGA-E9-A1N5-11A-41R-A14C-13 MIMAT0000064 hsa-l… MI00000… 14554 7981.
Here’s the summary of size of each dataset
Dataset name | Number of rows |
---|---|
mirtarbasegene | 380627 |
TCGA_E9_A1N5_normal | 56830 |
TCGA_E9_A1N5_tumor | 56830 |
TCGA_E9_A1N5_mirnanormal | 750 |
TCGA_E9_A1N5_mirnatumor | 648 |
3.4. Integrating and analysing data
All of these datasets are integrated using the code below resulting in miRNA:target dataset that contains miRNA and gene expression values.
TCGA_E9_A1N5_mirnagene <- TCGA_E9_A1N5_mirnanormal %>%
inner_join(mirtarbasegene, by= "miRNA") %>%
inner_join(TCGA_E9_A1N5_normal,
by = c("Target"= "external_gene_name")) %>%
select(Target, miRNA, total_read, gene_expression) %>%
distinct()
#> Warning in inner_join(., TCGA_E9_A1N5_normal, by = c(Target = "external_gene_name")): Detected an unexpected many-to-many relationship between `x` and `y`.
#> ℹ Row 3405 of `x` matches multiple rows in `y`.
#> ℹ Row 842 of `y` matches multiple rows in `x`.
#> ℹ If a many-to-many relationship is expected, set `relationship =
#> "many-to-many"` to silence this warning.
Note: Some of genes have expression values more than one because some of tissue samples were sequenced in two medium separately. So, we select maximum expression values of that genes at following:
#> # A tibble: 26 × 3
#> # Groups: Target, miRNA [26]
#> Target miRNA n
#> <chr> <chr> <int>
#> 1 COG8 hsa-miR-186-5p 2
#> 2 GOLGA8M hsa-miR-1270 2
#> 3 GOLGA8M hsa-miR-5703 2
#> 4 MATR3 hsa-let-7e-5p 2
#> 5 MATR3 hsa-miR-1-3p 2
#> 6 MATR3 hsa-miR-10b-3p 2
#> 7 MATR3 hsa-miR-125b-5p 2
#> 8 MATR3 hsa-miR-149-5p 2
#> 9 MATR3 hsa-miR-155-5p 2
#> 10 MATR3 hsa-miR-16-1-3p 2
#> # ℹ 16 more rows
head(TCGA_E9_A1N5_mirnagene)
#> # A tibble: 6 × 4
#> Target miRNA total_read gene_expression
#> <chr> <chr> <int> <dbl>
#> 1 CDK6 hsa-let-7a-5p 67599 4669
#> 2 MYC hsa-let-7a-5p 67599 11593
#> 3 BCL2 hsa-let-7a-5p 67599 2445
#> 4 NKIRAS2 hsa-let-7a-5p 67599 1519
#> 5 ITGB3 hsa-let-7a-5p 67599 196
#> 6 NF2 hsa-let-7a-5p 67599 1755
When we compared the two gene expression dataset of TCGA-E9A1N5 patient, and selected a gene which has 30-fold increased expression, (gene name: HIST1H3H), this gene node will be used in the example. Note that the selected node must not be isolated one. If the an isolated node is selected the change in expression will not propagate in network. (You can see commands for node selection in the vignette The auxiliary commands which can help to the users)
Optionally, you can filter the low expressed gene nodes because they are not effective elements.
TCGA_E9_A1N5_mirnagene <- TCGA_E9_A1N5_mirnagene%>%
filter(gene_expression > 10)
The analysis is performed based on amounts of miRNAs and targets as seen. Firstly, we tried to find optimal iteration for the network when simulation start with HIST1H3H node. As an example, simulation()
function was used with cycle = 5
argument, this argument can be arranged according to network. Note that it can be appropriate that using greater number of cycle
for comprehensive network objects.
simulation_res_HIST <- TCGA_E9_A1N5_mirnagene %>%
priming_graph(competing_count = gene_expression,
miRNA_count = total_read) %>%
update_how(node_name = "HIST1H3H", how =30) %>%
simulate(5)
simulation_res_HIST%>%
find_iteration(plot=TRUE)
The graph was shown that the change in expression level of HIST1H3H results in weak perturbation efficiency, despite 30-fold change. The code shown below can be used for calculation of fold changes after simulation HIST1H3H gene to 30 fold:
simulation_res_HIST%>%
as_tibble()%>%
mutate(FC= count_current/initial_count)%>%
arrange(desc(FC))
#> # A tibble: 13,432 × 8
#> name type node_id initial_count count_pre count_current changes_variable
#> <chr> <chr> <int> <dbl> <dbl> <dbl> <chr>
#> 1 HIST1H3H Comp… 9705 27 808 808 Competing
#> 2 CDK6 Comp… 1 4669 4669 4669 Competing
#> 3 MYC Comp… 2 11593 11593 11593 Competing
#> 4 BCL2 Comp… 3 2445 2445 2445 Competing
#> 5 NKIRAS2 Comp… 4 1519 1519 1519 Competing
#> 6 ITGB3 Comp… 5 196 196 196 Competing
#> 7 NF2 Comp… 6 1755 1755 1755 Competing
#> 8 NRAS Comp… 7 2311 2311 2311 Competing
#> 9 KRAS Comp… 8 1204 1204 1204 Competing
#> 10 PRDM1 Comp… 9 456 456 456 Competing
#> # ℹ 13,422 more rows
#> # ℹ 1 more variable: FC <dbl>
And then, we tried to simulate the network with the gene which has higher expression value. For this, we selected ACTB node as shown in The auxiliary commands which can help to the users
simulation_res_ACTB <- TCGA_E9_A1N5_mirnagene %>%
priming_graph(competing_count = gene_expression,
miRNA_count = total_read) %>%
update_how(node_name = "ACTB", how =1.87) %>%
simulate(5)
simulation_res_ACTB%>%
find_iteration(plot=TRUE)
Following codes are shown entire gene fold changes after simulation ACTB gene to 1.87 fold:
simulation_res_ACTB%>%
as_tibble()%>%
mutate(FC= count_current/initial_count)%>%
arrange(desc(FC))
#> # A tibble: 13,432 × 8
#> name type node_id initial_count count_pre count_current changes_variable
#> <chr> <chr> <int> <dbl> <dbl> <dbl> <chr>
#> 1 ACTB Compe… 84 101917 183705 183705 Competing
#> 2 YOD1 Compe… 472 470 472 472 Competing
#> 3 ADIPOR2 Compe… 319 3015 3026 3026 Competing
#> 4 FAM105A Compe… 385 569 571 571 Competing
#> 5 RRM2 Compe… 67 1479 1484 1484 Competing
#> 6 NKIRAS2 Compe… 4 1519 1524 1524 Competing
#> 7 IGF1R Compe… 324 3875 3887 3887 Competing
#> 8 NRAS Compe… 7 2311 2318 2318 Competing
#> 9 SOCS7 Compe… 592 1331 1335 1335 Competing
#> 10 SRD5A3 Compe… 2887 676 678 678 Competing
#> # ℹ 13,422 more rows
#> # ℹ 1 more variable: FC <dbl>
Note: it can be useful that you look at The auxiliary commands which can help to the users for perturbation efficiency of ACTB gene by simulation with same conditions and different expression changes.
3.5. The sum of two conditions:
In a real biological sample, we tested perturbation efficiencies of two genes; * one with low expression but high fold change (HIST1H3H, 30-fold increase in tumor) * another one with high expression but small change in expression level (ACTB, 1.87-fold increase in tumor)
With these two samples, it has been obtained that expression values of genes, rest of the perturbed gene, changed slightly.
Despite high fold change, former gene caused little perturbation. When the perturbation efficiencies of both of these genes are analysed, it has been oberved that HIST1H3H does not affect the other genes in given limit. On the contrary, high expressing gene with very low fold increase in tumor causes greater perturbation in the network. Additionaly, the perturbation efficiency of ACTB gene is quite high from HIST1H3H with 30-fold change, when ACTB is simulated with 30 fold-change.
Thus, if the perturbed node has lower target:total target ratio in group or groups, the efficiency of it can be weak, or vice versa. The efficiency of ACTB gene may be high for this reason, in comparison with HIST1H3H perturbation. In fact, it has been observed that ACTB has not strong perturbation efficiency too. This could be arisen from low miRNA:target ratio or ineffective target nodes which have very low expression levels.