Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation - PubMed (original) (raw)

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation

Davis J McCarthy et al. Nucleic Acids Res. 2012 May.

Abstract

A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses.

PubMed Disclaimer

Figures

Figure 1.

Multidimensional scaling plot of the squamous cell carcinoma profiles in which distances correspond to BCV between pairs of samples. Pairwise BCVs were computed from the 500 most heterogeneous genes. Samples are labelled with patient number and either ‘T’ for tumour or ‘N’ or normal. The first plot dimension roughly corresponds to tissue source (normal or tumour) and the second to patient differences. The tumour samples are more heterogeneous than the normals.

Figure 2.

QQ-plots of goodness of fit statistics using common, trended or empirical Bayes genewise (tagwise) dispersions. Genewise deviance statistics were transformed to normality, and plotted against theoretical normal quantiles. Points in blue are those genes with a significantly poor fit (Holm-adjusted _P_-value < 0.05). When using genewise dispersions, no genes show a significantly poor fit.

Figure 3.

Boxplots of common BCV estimates from 100 simulated data sets. The left panel shows results for the one group case, with three replicate samples in the group. The right panel shows results for a paired-design with two groups and three blocks. The horizontal lines indicate the true common BCV of 0.4, chosen to match with the carcinoma case study. Conditional maximum likelihood (qCML) is the most accurate in the former case. For generalized linear models, Cox–Reid APL is the best performer.

Figure 4.

Mean-square error with which empirical Bayes genewise dispersions estimate the true dispersion (BCV2), when true dispersions are randomly generated. In this case, the optimal prior weight is 10–12 prior genes, equivalent to 20–24 prior degrees of freedom. The common BCV estimator is equivalent to using infinite weight for the prior. Boxplots show results for 10 simulations.

Cited by

Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19.
Liu C, Martins AJ, Lau WW, Rachmaninoff N, Chen J, Imberti L, Mostaghimi D, Fink DL, Burbelo PD, Dobbs K, Delmonte OM, Bansal N, Failla L, Sottini A, Quiros-Roldan E, Han KL, Sellers BA, Cheung F, Sparks R, Chun TW, Moir S, Lionakis MS; NIAID COVID Consortium; COVID Clinicians; Rossi C, Su HC, Kuhns DB, Cohen JI, Notarangelo LD, Tsang JS. Liu C, et al. Cell. 2021 Apr 1;184(7):1836-1857.e22. doi: 10.1016/j.cell.2021.02.018. Epub 2021 Feb 10. Cell. 2021. PMID: 33713619 Free PMC article.
The Proteomic and Transcriptomic Landscapes Altered by Rgg2/3 Activity in Streptococcus pyogenes.
Rued BE, Anderson CM, Federle MJ. Rued BE, et al. J Bacteriol. 2022 Nov 15;204(11):e0017522. doi: 10.1128/jb.00175-22. Epub 2022 Oct 31. J Bacteriol. 2022. PMID: 36314832 Free PMC article.
An R package for generic modular response analysis and its application to estrogen and retinoic acid receptor crosstalk.
Jimenez-Dominguez G, Ravel P, Jalaguier S, Cavaillès V, Colinge J. Jimenez-Dominguez G, et al. Sci Rep. 2021 Mar 31;11(1):7272. doi: 10.1038/s41598-021-86544-0. Sci Rep. 2021. PMID: 33790340 Free PMC article.
Metabolic Imaging Detects Resistance to PI3Kα Inhibition Mediated by Persistent FOXM1 Expression in ER+ Breast Cancer.
Ros S, Wright AJ, D'Santos P, Hu DE, Hesketh RL, Lubling Y, Georgopoulou D, Lerda G, Couturier DL, Razavi P, Pelossof R, Batra AS, Mannion E, Lewis DY, Martin A, Baird RD, Oliveira M, de Boo LW, Linn SC, Scaltriti M, Rueda OM, Bruna A, Caldas C, Brindle KM. Ros S, et al. Cancer Cell. 2020 Oct 12;38(4):516-533.e9. doi: 10.1016/j.ccell.2020.08.016. Epub 2020 Sep 24. Cancer Cell. 2020. PMID: 32976773 Free PMC article.
NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.
Dong K, Zhao H, Tong T, Wan X. Dong K, et al. BMC Bioinformatics. 2016 Sep 13;17(1):369. doi: 10.1186/s12859-016-1208-1. BMC Bioinformatics. 2016. PMID: 27623864 Free PMC article.

References

1. National Human Genome Research Institute (2011). DNA sequencing costs. http://www.genome.gov/sequencingcosts/
1. Morrissy AS, Morin RD, Delaney A, Zeng T, McDonald H, Jones S, Zhao Y, Hirst M, Marra MA. Next-generation tag sequencing for cancer gene expression profiling. Genome Res. 2009;19:1825–1835. - PMC - PubMed
1. 't Hoen PAC, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RHAM, Menezes RXD, Boer JM, Ommen GJBV, Dunnen JTD. Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res. 2008;36:e141. - PMC - PubMed
1. Wu ZJ, Meyer CA, Choudhury S, Shipitsin M, Maruyama R, Bessarabova M, Nikolskaya T, Sukumar S, Schwartzman A, Liu JS, et al. Gene expression profiling of human breast tissue samples using SAGE-Seq. Genome Res. 2010;20:1730–1739. - PMC - PubMed
1. Mortazavi A, Williams BA, Mccue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Meth. 2008;5:621–628. - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation - PubMed (original) (raw)