RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR - PubMed (original) (raw)
RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR
Charity W Law et al. F1000Res. 2016.
Abstract
The ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular edgeR package to import, organise, filter and normalise the data, followed by the limma package with its voom method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing. This pipeline is further enhanced by the Glimma package which enables interactive exploration of the results so that individual samples and genes can be examined by the user. The complete analysis offered by these three packages highlights the ease with which researchers can turn the raw counts from an RNA-sequencing experiment into biological insights using Bioconductor.
Keywords: RNA sequencing; data analysis; gene expression.
Conflict of interest statement
No competing interests were disclosed.
Figures
Figure 1.
The density of log-CPM values for raw pre-filtered data (A) and post-filtered data (B) are shown for each sample. Dotted vertical lines mark the log-CPM threshold (equivalent to a CPM value of about 0.2) used in the filtering step.
Figure 2.
Example data: Boxplots of log-CPM values showing expression distributions for unnormalised data (A) and normalised data (B) for each sample in the modified dataset where the counts in samples 1 and 2 have been scaled to 5% and 500% of their original values respectively.
Figure 3.
MDS plots of log-CPM values over dimensions 1 and 2 with samples coloured and labeled by sample groups (A) and over dimensions 3 and 4 with samples coloured and labeled by sequencing lane (B). Distances on the plot correspond to the leading fold-change, which is the average (root-mean-square) log2-fold-change for the 500 genes most divergent between each pair of samples by default.
Figure 4.
Means (x-axis) and variances (y-axis) of each gene are plotted to show the dependence between the two before
voom
is applied to the data (A) and how the trend is removed after
voom
precision weights are applied to the data (B). The plot on the left is created within the
voom
function which extracts residual variances from fitting linear models to log-CPM transformed data. Variances are then rescaled to quarter-root variances (or square-root of standard deviations) and plotted against the mean expression of each gene. The means are log2-transformed mean-counts with an offset of 2. The plot on the right is created using
plotSA
which plots log2 residual standard deviations against mean log-CPM values. The average log2 residual standard deviation is marked by a horizontal blue line. In both plots, each black dot represents a gene and a red curve is fitted to these points.
Figure 5.
Venn diagram showing the number of genes DE in the comparison between basal versus LP only (left), basal versus ML only (right), and the number of genes that are DE in both comparisons (center). The number of genes that are not DE in either comparison are marked in the bottom-right.
Figure 6.
Interactive mean-difference plot generated using Glimma. Summary data (log-FCs versus log-CPM values) are shown in the left panel which is linked to the individual values per sample for a selected gene in the right panel. A table of results is also displayed below these figures, along with a search bar to allow users to look up a particular gene using the annotation information available, e.g. the Gene symbol identifier_Clu_.
Figure 7.
Heatmap of log-CPM values for top 100 genes DE in basal versus LP. Expression across each gene (or row) have been scaled so that mean expression is zero and standard deviation is one. Samples with relatively high expression of a given gene are marked in red and samples with relatively low expression are marked in blue. Lighter shades and white represent genes with intermediate expression levels. Samples and genes have been reordered by the method of hierarchical clustering. A dendrogram is shown for the sample clustering.
Figure 8.
Barcode plot of LIM_MAMMARY_LUMINAL_MATURE_UP (red bars, top of plot) and LIM_MAMMARY_LUMINAL_MATURE_DN (blue bars, bottom of plot) gene sets in the LP versus ML contrast. For each set, an enrichment line that shows the relative enrichment of the vertical bars in each part of the plot is displayed. The experiment of Lim_et al._ (2010) is very similar to the current one, with the same sorting strategy used to obtain the different cell populations, except that microarrays were used instead of RNA-seq to profile gene expression. Note that the inverse correlation (the up gene set is down and the down gene set is up) is a result of the way the contrast has been set up (LP versus ML) – if reversed, the directionality would agree.
Similar articles
- From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.
Chen Y, Lun AT, Smyth GK. Chen Y, et al. F1000Res. 2016 Jun 20;5:1438. doi: 10.12688/f1000research.8987.2. eCollection 2016. F1000Res. 2016. PMID: 27508061 Free PMC article. - Glimma: interactive graphics for gene expression analysis.
Su S, Law CW, Ah-Cann C, Asselin-Labat ML, Blewitt ME, Ritchie ME. Su S, et al. Bioinformatics. 2017 Jul 1;33(13):2050-2052. doi: 10.1093/bioinformatics/btx094. Bioinformatics. 2017. PMID: 28203714 Free PMC article. - Three Differential Expression Analysis Methods for RNA Sequencing: limma, EdgeR, DESeq2.
Liu S, Wang Z, Zhu R, Wang F, Cheng Y, Liu Y. Liu S, et al. J Vis Exp. 2021 Sep 18;(175). doi: 10.3791/62528. J Vis Exp. 2021. PMID: 34605806 - Differential methylation analysis of reduced representation bisulfite sequencing experiments using edgeR.
Chen Y, Pal B, Visvader JE, Smyth GK. Chen Y, et al. F1000Res. 2017 Nov 28;6:2055. doi: 10.12688/f1000research.13196.2. eCollection 2017. F1000Res. 2017. PMID: 29333247 Free PMC article. - Integrative Differential Expression Analysis for Multiple EXperiments (IDEAMEX): A Web Server Tool for Integrated RNA-Seq Data Analysis.
Jiménez-Jacinto V, Sanchez-Flores A, Vega-Alvarado L. Jiménez-Jacinto V, et al. Front Genet. 2019 Mar 29;10:279. doi: 10.3389/fgene.2019.00279. eCollection 2019. Front Genet. 2019. PMID: 30984248 Free PMC article.
Cited by
- VEGF-dependent testicular vascularisation involves MEK1/2 signalling and the essential angiogenesis factors, SOX7 and SOX17.
Blücher RO, Lim RS, Ritchie ME, Western PS. Blücher RO, et al. BMC Biol. 2024 Oct 1;22(1):222. doi: 10.1186/s12915-024-02003-y. BMC Biol. 2024. PMID: 39354506 Free PMC article. - The identification and analysis of meristematic mutations within the apple tree that developed the RubyMac sport mutation.
Sun H, Abeli P, Campoy JA, Rütjes T, Krause K, Jiao WB, Beaudry R, Schneeberger K. Sun H, et al. BMC Plant Biol. 2024 Oct 1;24(1):912. doi: 10.1186/s12870-024-05628-x. BMC Plant Biol. 2024. PMID: 39350074 Free PMC article. - From CFTR to a CF signalling network: a systems biology approach to study Cystic Fibrosis.
Najm M, Martignetti L, Cornet M, Kelly-Aubert M, Sermet I, Calzone L, Stoven V. Najm M, et al. BMC Genomics. 2024 Sep 28;25(1):892. doi: 10.1186/s12864-024-10752-x. BMC Genomics. 2024. PMID: 39342081 Free PMC article. - RNAseqCovarImpute: a multiple imputation procedure that outperforms complete case and single imputation differential expression analysis.
Baker BH, Sathyanarayana S, Szpiro AA, MacDonald JW, Paquette AG. Baker BH, et al. Genome Biol. 2024 Sep 3;25(1):236. doi: 10.1186/s13059-024-03376-7. Genome Biol. 2024. PMID: 39227979 Free PMC article. - Effects of Different Combinations of Phytochemical-Rich Fruits and Vegetables on Chronic Disease Risk Markers and Gene Expression Changes: Insights from the MiBLEND Study, a Randomized Trial.
DeBenedictis JN, Murrell C, Hauser D, van Herwijnen M, Elen B, de Kok TM, van Breda SG. DeBenedictis JN, et al. Antioxidants (Basel). 2024 Jul 29;13(8):915. doi: 10.3390/antiox13080915. Antioxidants (Basel). 2024. PMID: 39199161 Free PMC article.
References
Grants and funding
This work was funded by the National Health and Medical Research Council (NHMRC) (Fellowship GNT1058892 and Program GNT1054618 to GKS, Project GNT1050661 to MER and GKS and Fellowship GNT1104924 to MER), Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIISS.
LinkOut - more resources
Full Text Sources
Other Literature Sources