Running MultiRNAflow on a RNA-seq raw counts with different time points and several biological conditions (original) (raw)

Introduction

Quick description of the document

This document is a quick workflow describing how to use our R package MultiRNAflow on one dataset (see Dataset used as example). For a more complete description of our package and complete outputs with graphs, the user must read our pdf file entitled ’MultiRNAflow_vignette-knitr.pdf".

Dataset used as example

The Mouse dataset 2 (Weger et al. 2021) is accessible on the Gene Expression Omnibus (GEO) database with the accession number GSE135898.

This dataset contains the temporal transcription profile of 16 mice with Bmal1 and Cry1/2 knocked-down under an ad libitum (AL) or night restricted feeding (RF) regimen. Data were collected at 0, 4h, 8h, 16, 20h and 24h. Therefore, there are six time points and eight biological conditions. As there are only two mice per biological condition, we decided not to take into account the effect of the regimen. The dataset contains temporal expression data of 40327 genes.

To illustrate the use of our package, we take 500 genes, over the global 40327 genes in the original dataset. This sub dataset is saved in the file RawCounts_Weger2021_MOUSEsub500.

Quick workflow

Load package, example dataset and preamble

  1. Load MultiRNAflow
library(MultiRNAflow)
  1. Load Mouse dataset 2
data("RawCounts_Weger2021_MOUSEsub500")
  1. Structure of the dataset (Preamble)

The dataset must be a data.frame containing raw counts data. If it is not the case, the functions DATAnormalization() andDEanalysisGlobal() will stop and print an error. Each line should correspond to a gene, each column to a sample, except a particular column that may contain strings of characters describing the names of the genes. The first line of the data.frame should contain the names of the columns (strings of characters) that must have the following structure.

head(colnames(RawCounts_Weger2021_MOUSEsub500), n=5)
#> [1] "Gene"       "BmKo_t1_r1" "BmKo_t2_r1" "BmKo_t0_r1" "BmKo_t3_r1"

In this example, “Gene” indicates the column which contains the names of the different genes. The other column names contain all kind of information about the sample, including the biological condition, the time of measurement and the name of the individual (e.g patient, replicate, mouse, yeasts culture…). Other kinds of information can be stored in the column names (such as patient information), but they will not be used by the package. The various information in the column names must be separated by underscores. The order of these information is arbitrary but must be the same for all columns. For instance, the sample “BmKo_t0_r1” corresponds to the first replicate (r1) of the biological condition BmKo at time t0 (reference time).

The information located to the left of the first underscore will be considered to be in position 1, the information located between the first underscore and the second one will be considered to be in position 2, and so on. In the previous example, the biological condition is in position 1, the time is in position 2 and the replicate is in position 3. In most of the functions of our package, the order of the previous information in the column names will be indicated with the inputs Group.position,Time.position and Individual.position. Similarly the input Column.gene will indicate the number of the column containing gene names.

Preprocessing

  1. Preprocessing with DATAprepSE()
resDATAprepSE <- DATAprepSE(RawCounts=RawCounts_Weger2021_MOUSEsub500,
                            Column.gene=1,
                            Group.position=1,
                            Time.position=2,
                            Individual.position=3)

Exploratory data analysis

  1. Normalization with DATAnormalization()
resNorm <- DATAnormalization(SEres=resDATAprepSE,
                             Normalization="vst",
                             Blind.rlog.vst=FALSE,
                             Plot.Boxplot=FALSE,
                             Colored.By.Factors=TRUE,
                             Color.Group=NULL,
                             path.result=NULL)
  1. Principal component analysis (PCA) with PCAanalysis()
resPCA <- PCAanalysis(SEresNorm=resNorm,
                      DATAnorm=TRUE,
                      gene.deletion=NULL,
                      sample.deletion=NULL,
                      Plot.PCA=FALSE,
                      Mean.Accross.Time=FALSE,
                      Color.Group=NULL,
                      Cex.label=0.6,
                      Cex.point=0.7, epsilon=0.2,
                      Phi=25,Theta=140,
                      motion3D=FALSE,
                      path.result=NULL)
  1. Hierarchical Clustering on Principle Components (HCPC) with HCPCanalysis()
resHCPC <- HCPCanalysis(SEresNorm=resNorm,
                        DATAnorm=TRUE,
                        gene.deletion=NULL,
                        sample.deletion=NULL,
                        Plot.HCPC=FALSE,
                        Phi=25,Theta=140,
                        Cex.point=0.6,
                        epsilon=0.2,
                        Cex.label=0.6,
                        motion3D=FALSE,
                        path.result=NULL)
  1. Temporal clustering analysis with MFUZZanalysis().
resMFUZZ <- MFUZZanalysis(SEresNorm=resNorm,
                          DATAnorm=TRUE,
                          DataNumberCluster=NULL,
                          Method="hcpc",
                          Membership=0.5,
                          Min.std=0.1,
                          Plot.Mfuzz=FALSE,
                          path.result=NULL)
  1. Plot temporal expression with with DATAplotExpressionGenes()
resEXPR <- DATAplotExpressionGenes(SEresNorm=resNorm,
                                   DATAnorm=TRUE,
                                   Vector.row.gene=c(17),
                                   Color.Group=NULL,
                                   Plot.Expression=FALSE,
                                   path.result=NULL)

Supervised statistical analysis

  1. Differential Expresion (DE) analysis with DEanalysisGlobal()

For the speed of the algorithm, we will take only three biological conditions and three times

Sub3bc3T <- RawCounts_Weger2021_MOUSEsub500
Sub3bc3T <- Sub3bc3T[, seq_len(73)]
SelectTime <- grep("_t0_", colnames(Sub3bc3T))
SelectTime <- c(SelectTime, grep("_t1_", colnames(Sub3bc3T)))
SelectTime <- c(SelectTime, grep("_t2_", colnames(Sub3bc3T)))
Sub3bc3T <- Sub3bc3T[, c(1, SelectTime)]

resSEsub <- DATAprepSE(RawCounts=Sub3bc3T,
                       Column.gene=1,
                       Group.position=1,
                       Time.position=2,
                       Individual.position=3)
resDE <- DEanalysisGlobal(SEres=resSEsub,
                          pval.min=0.05,
                          log.FC.min=1,
                          Plot.DE.graph=FALSE,
                          path.result=NULL)
#> [1] "Preprocessing"
#> [1] "Differential expression step with DESeq2::DESeq()"
#> [1] "Case 3 analysis : Biological conditions and Times."
#> [1] "DE time analysis for each biological condition."
#> [1] "DE group analysis for each time measurement."
#> [1] "Combined time and group results."
  1. Volcano and ratio intensity (MA) plots with DEplotVolcanoMA()
resMAvolcano <- DEplotVolcanoMA(SEresDE=resDE,
                                NbGene.plotted=2,
                                SizeLabel=3,
                                Display.plots=FALSE,
                                Save.plots=FALSE)
  1. Heatmaps with DEplotHeatmaps()
resHEAT <- DEplotHeatmaps(SEresDE=resDE,
                          ColumnsCriteria=2,
                          Set.Operation="union",
                          NbGene.analysis=20,
                          SizeLabelRows=5,
                          SizeLabelCols=5,
                          Display.plots=FALSE,
                          Save.plots=TRUE)
  1. GO enrichment analysis with GSEAQuickAnalysis() and GSEApreprocessing()
resRgprofiler2 <- GSEAQuickAnalysis(Internet.Connection=FALSE,
                                    SEresDE=resDE,
                                    ColumnsCriteria=2,
                                    ColumnsLog2ordered=NULL,
                                    Set.Operation="union",
                                    Organism="mmusculus",
                                    MaxNumberGO=20,
                                    Background=FALSE,
                                    Display.plots=FALSE,
                                    Save.plots=TRUE)
resGSEApreprocess <- GSEApreprocessing(SEresDE=resDE,
                                       ColumnsCriteria=2,
                                       Set.Operation="union",
                                       Rnk.files=FALSE,
                                       Save.files=FALSE)

SessionInfo

Here is the output of sessionInfo() on the system on which this document was compiled.

sessionInfo()
#> R version 4.5.0 RC (2025-04-04 r88126)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] tcltk     stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] MultiRNAflow_1.6.0  Mfuzz_2.68.0        DynDoc_1.86.0      
#> [4] widgetTools_1.86.0  e1071_1.7-16        Biobase_2.68.0     
#> [7] BiocGenerics_0.54.0 generics_0.1.3      BiocStyle_2.36.0   
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3          jsonlite_2.0.0             
#>   [3] shape_1.4.6.1               magrittr_2.0.3             
#>   [5] magick_2.8.6                TH.data_1.1-3              
#>   [7] estimability_1.5.1          farver_2.1.2               
#>   [9] rmarkdown_2.29              GlobalOptions_0.1.2        
#>  [11] fs_1.6.6                    vctrs_0.6.5                
#>  [13] Cairo_1.6-2                 base64enc_0.1-3            
#>  [15] rstatix_0.7.2               htmltools_0.5.8.1          
#>  [17] S4Arrays_1.8.0              broom_1.0.8                
#>  [19] Formula_1.2-5               SparseArray_1.8.0          
#>  [21] gridGraphics_0.5-1          sass_0.4.10                
#>  [23] bslib_0.9.0                 htmlwidgets_1.6.4          
#>  [25] plyr_1.8.9                  sandwich_3.1-1             
#>  [27] emmeans_1.11.0              plotly_4.10.4              
#>  [29] zoo_1.8-14                  cachem_1.1.0               
#>  [31] misc3d_0.9-1                lifecycle_1.0.4            
#>  [33] iterators_1.0.14            pkgconfig_2.0.3            
#>  [35] Matrix_1.7-3                R6_2.6.1                   
#>  [37] fastmap_1.2.0               GenomeInfoDbData_1.2.14    
#>  [39] MatrixGenerics_1.20.0       clue_0.3-66                
#>  [41] digest_0.6.37               colorspace_2.1-1           
#>  [43] S4Vectors_0.46.0            DESeq2_1.48.0              
#>  [45] GenomicRanges_1.60.0        ggpubr_0.6.0               
#>  [47] labeling_0.4.3              httr_1.4.7                 
#>  [49] abind_1.4-8                 compiler_4.5.0             
#>  [51] proxy_0.4-27                withr_3.0.2                
#>  [53] doParallel_1.0.17           backports_1.5.0            
#>  [55] BiocParallel_1.42.0         viridis_0.6.5              
#>  [57] carData_3.0-5               UpSetR_1.4.0               
#>  [59] dendextend_1.19.0           Rttf2pt1_1.3.12            
#>  [61] ggsignif_0.6.4              MASS_7.3-65                
#>  [63] tkWidgets_1.86.0            DelayedArray_0.34.0        
#>  [65] rjson_0.2.23                scatterplot3d_0.3-44       
#>  [67] flashClust_1.01-2           tools_4.5.0                
#>  [69] extrafontdb_1.0             FactoMineR_2.11            
#>  [71] glue_1.8.0                  grid_4.5.0                 
#>  [73] cluster_2.1.8.1             reshape2_1.4.4             
#>  [75] gtable_0.3.6                class_7.3-23               
#>  [77] tidyr_1.3.1                 data.table_1.17.0          
#>  [79] car_3.1-3                   XVector_0.48.0             
#>  [81] ggrepel_0.9.6               foreach_1.5.2              
#>  [83] pillar_1.10.2               stringr_1.5.1              
#>  [85] yulab.utils_0.2.0           circlize_0.4.16            
#>  [87] splines_4.5.0               dplyr_1.1.4                
#>  [89] lattice_0.22-7              survival_3.8-3             
#>  [91] tidyselect_1.2.1            ComplexHeatmap_2.24.0      
#>  [93] locfit_1.5-9.12             knitr_1.50                 
#>  [95] gridExtra_2.3               bookdown_0.43              
#>  [97] IRanges_2.42.0              SummarizedExperiment_1.38.0
#>  [99] stats4_4.5.0                xfun_0.52                  
#> [101] plot3Drgl_1.0.4             factoextra_1.0.7           
#> [103] matrixStats_1.5.0           DT_0.33                    
#> [105] stringi_1.8.7               UCSC.utils_1.4.0           
#> [107] lazyeval_0.2.2              yaml_2.3.10                
#> [109] evaluate_1.0.3              codetools_0.2-20           
#> [111] extrafont_0.19              tibble_3.2.1               
#> [113] BiocManager_1.30.25         multcompView_0.1-10        
#> [115] ggplotify_0.1.2             cli_3.6.4                  
#> [117] xtable_1.8-4                munsell_0.5.1              
#> [119] jquerylib_0.1.4             Rcpp_1.0.14                
#> [121] GenomeInfoDb_1.44.0         gprofiler2_0.2.3           
#> [123] coda_0.19-4.1               png_0.1-8                  
#> [125] parallel_4.5.0              leaps_3.2                  
#> [127] rgl_1.3.18                  ggplot2_3.5.2              
#> [129] ggalluvial_0.12.5           viridisLite_0.4.2          
#> [131] mvtnorm_1.3-3               plot3D_1.4.1               
#> [133] scales_1.3.0                purrr_1.0.4                
#> [135] crayon_1.5.3                GetoptLong_1.0.5           
#> [137] rlang_1.1.6                 multcomp_1.4-28

Weger, Benjamin D., Cédric Gobet, Fabrice P. A. David, Florian Atger, Eva Martin, Nicholas E. Phillips, Aline Charpagne, Meltem Weger, Felix Naef, and Frédéric Gachon. 2021. “Systematic Analysis of Differential Rhythmic Liver Gene Expression Mediated by the Circadian Clock and Feeding Rhythms.” Proceedings of the National Academy of Sciences 118 (3): e2015803118. https://doi.org/10.1073/pnas.2015803118.