Computational and analytical challenges in single-cell transcriptomics (original) (raw)
Bernstein, B. E. et al. An integrated encyclopedia of DNA elements in the human genome. Nature489, 57–74 (2012). Google Scholar
Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature478, 343–348 (2011). CASPubMed Google Scholar
Blekhman, R., Oshlack, A., Chabot, A. E., Smyth, G. K. & Gilad, Y. Gene regulation in primates evolves under tissue-specific selection pressures. PLoS Genet.4, e1000271 (2008). PubMedPubMed Central Google Scholar
Deng, Q., Ramskold, D., Reinius, B. & Sandberg, R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science343, 193–196 (2014). CASPubMed Google Scholar
Barreiro, L. B. et al. Deciphering the genetic architecture of variation in the immune response to Mycobacterium tuberculosis infection. Proc. Natl Acad. Sci. USA109, 1204–1209 (2012). CASPubMed Google Scholar
Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature486, 346–352 (2012). CASPubMedPubMed Central Google Scholar
Shapiro, E., Biezuner, T. & Linnarsson, S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nature Rev. Genet.14, 618–630 (2013). This is a related review discussing challenges and analysis opportunities of single-cell sequencing, for example, to reconstruct lineages in cancer. CASPubMed Google Scholar
Perou, C. M. et al. Molecular portraits of human breast tumours. Nature406, 747–752 (2000). CASPubMed Google Scholar
Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res.18, 1509–1517 (2008). CASPubMedPubMed Central Google Scholar
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods5, 621–628 (2008). CASPubMed Google Scholar
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science320, 1344–1349 (2008). CASPubMedPubMed Central Google Scholar
Perry, G. H. et al. Comparative RNA sequencing reveals substantial genetic variation in endangered primates. Genome Res.22, 602–610 (2012). CASPubMedPubMed Central Google Scholar
van 't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature415, 530–536 (2002). CASPubMed Google Scholar
Sandberg, R. Entering the era of single-cell transcriptomics in biology and medicine. Nature Methods11, 22–24 (2014). CASPubMed Google Scholar
Ohnishi, Y. et al. Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nature Cell Biol.16, 27–37 (2014). CASPubMed Google Scholar
Skamagki, M., Wicher, K. B., Jedrusik, A., Ganguly, S. & Zernicka-Goetz, M. Asymmetric localization of Cdx2 mRNA during the first cell-fate decision in early mouse development. Cell Rep.3, 442–457 (2013). CASPubMedPubMed Central Google Scholar
Tang, F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis. Cell Stem Cell6, 468–478 (2010). CASPubMedPubMed Central Google Scholar
Diez-Roux, G. et al. A high-resolution anatomical atlas of the transcriptome in the mouse embryo. PLoS Biol.9, e1000582 (2011). CASPubMedPubMed Central Google Scholar
Munsky, B., Neuert, G. & van Oudenaarden, A. Using gene expression noise to understand gene regulation. Science336, 183–187 (2012). CASPubMedPubMed Central Google Scholar
Raj, A. & van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell135, 216–226 (2008). CASPubMedPubMed Central Google Scholar
Chalfie, M., Tu, Y., Euskirchen, G., Ward, W. W. & Prasher, D. C. Green fluorescent protein as a marker for gene expression. Science263, 802–805 (1994). CASPubMed Google Scholar
Coons, A. H., Creech, H. J. & Jones, R. N. Immunological properties of an antibody containing a fluorescent group. Proc. Soc. Exp. Biol. Med.47, 200–202 (1941). CAS Google Scholar
Taniguchi, K., Kajiyama, T. & Kambara, H. Quantitative analysis of gene expression in a single cell by qPCR. Nature Methods6, 503–506 (2009). CASPubMed Google Scholar
Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods5, 877–879 (2008). CASPubMedPubMed Central Google Scholar
Faddah, D. A. et al. Single-cell analysis reveals that expression of nanog is biallelic and equally variable as that of other pluripotency factors in mouse ESCs. Cell Stem Cell13, 23–29 (2013). CASPubMedPubMed Central Google Scholar
Tang, F. et al. mRNA-seq whole-transcriptome analysis of a single cell. Nature Methods6, 377–382 (2009). CASPubMed Google Scholar
Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res.21, 1160–1167 (2011). CASPubMedPubMed Central Google Scholar
Ramskold, D. et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nature Biotech.30, 777–782 (2012). Google Scholar
Sasagawa, Y. et al. Quartz-seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals non-genetic gene-expression heterogeneity. Genome Biol.14, R31 (2013). PubMedPubMed Central Google Scholar
Jaitin, D. A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science343, 776–779 (2014). CASPubMedPubMed Central Google Scholar
Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-seq: single-cell RNA-seq by multiplexed linear amplification. Cell Rep.2, 666–673 (2012). CASPubMed Google Scholar
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nature Methods10, 1096–1098 (2013). Recent protocol developments, such as the development of Smart-seq2, have helped to substantially reduce biases and improved the sensitivity of scRNA-seq. CASPubMed Google Scholar
Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nature Methods10, 1093–1095 (2013). This paper reports a statistical approach that estimates and accounts for technical sources of variation in scRNA-seq experiments. This method exploits spike-ins to separate technical and biological variability of individual genes (see also reference 75). CASPubMed Google Scholar
Wu, A. R. et al. Quantitative assessment of single-cell RNA-sequencing methods. Nature Methods11, 41–46 (2014). CASPubMed Google Scholar
Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science344, 1396–1401 (2014). This paper provides an example in which sequencing the transcriptomes of a large number of single cells provided important insights into intra- and inter-tumour heterogeneity. CASPubMedPubMed Central Google Scholar
Shalek, A. K. et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature509, 363–369 (2014). Google Scholar
Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet.10, 57–63 (2009). CASPubMed Google Scholar
Oshlack, A., Robinson, M. D. & Young, M. D. From RNA-seq reads to differential expression results. Genome Biol.11, 220 (2010). CASPubMedPubMed Central Google Scholar
Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nature Methods11, 163–166 (2014). UMIs allow individual molecules to be barcoded. This protocol enables the absolute number of transcribed molecules to be estimated independently of amplification biases. CASPubMed Google Scholar
Fonseca, N. A., Rung, J., Brazma, A. & Marioni, J. C. Tools for mapping high-throughput sequencing data. Bioinformatics28, 3169–3177 (2012). CASPubMed Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics25, 1105–1111 (2009). CASPubMedPubMed Central Google Scholar
Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics26, 873–881 (2010). CASPubMedPubMed Central Google Scholar
Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotech.28, 503–510 (2010). CAS Google Scholar
Anders, S., Pyl, P. T. & Huber, W. HTseq — a Python framework to work with high-throughput sequencing data. Bioinformatics31, 166–169 (2015). CASPubMed Google Scholar
Davis, M. P., van Dongen, S., Abreu-Goodger, C., Bartonicek, N. & Enright, A. J. Kraken: a set of tools for quality control and analysis of high-throughput sequence data. Methods63, 41–49 (2013). CASPubMedPubMed Central Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nature Biotech.29, 24–26 (2011). CAS Google Scholar
Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform.14, 178–192 (2013). CASPubMed Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics12, 323 (2011). CASPubMedPubMed Central Google Scholar
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol.11, R106 (2010). This seminal paper describes statistical methods to test for differential gene expression using RNA-seq data. Although developed in the context of RNA-seq studies on bulk cell populations, this work has laid the foundation for a large family of normalization procedures, including recent methods that are dedicated to scRNA-seq data (see reference 33). CASPubMedPubMed Central Google Scholar
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol.11, R25 (2010). PubMedPubMed Central Google Scholar
Krebs, J. E., Goldstein, E. S. & Kilpatrick, S. T. Lewin's Genes XI (Jones & Bartlett Publishers, 2014). Google Scholar
Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nature Methods11, 740–742 (2014). This paper presents a Bayesian approach to test for differential gene expression in scRNA-seq studies. This approach extends methods for bulk RNA-seq (for example, reference 50) by accounting for single-cell-specific noise, such as dropout events and amplification biases. CASPubMedPubMed Central Google Scholar
Pickrell, J. K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature464, 768–772 (2010). CASPubMedPubMed Central Google Scholar
Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet.3, 1724–1735 (2007). CASPubMed Google Scholar
Stegle, O., Parts, L., Durbin, R. & Winn, J. A. Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol.6, e1000770 (2010). PubMedPubMed Central Google Scholar
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nature Protoc.7, 500–507 (2012). CAS Google Scholar
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotech.32, 896–902 (2014). CAS Google Scholar
Buettner, F. et al. Accounting for cell-to-cell heterogeneity in single-cell RNA-seq data reveals novel structure between cells. Nature Biotech.http://dx.doi.org/10.1038/nbt.3102 (2015). Confounding factors such as the cell cycle can obscure biologically relevant molecular signatures in scRNA-seq data sets. This work describes a computational approach to account for confounding factors. Related methods developed for bulk RNA profiling experiments are described in references 57–60.
Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature509, 371–375 (2014). CASPubMedPubMed Central Google Scholar
Durruthy-Durruthy, R. et al. Reconstruction of the mouse otocyst and early neuroblast lineage at single-cell resolution. Cell157, 964–978 (2014). CASPubMedPubMed Central Google Scholar
Moignard, V. et al. Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nature Cell Biol.15, 363–372 (2013). CASPubMed Google Scholar
Mahata, B. et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep.7, 1130–1142 (2014). This paper provides an example from T cell biology that shows how gene–gene correlations in scRNA-seq studies can be used to reveal novel mechanistic insights. CASPubMedPubMed Central Google Scholar
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotech.32, 381–386 (2014). This paper describes a computational approach to reconstruct a pseudotemporal order from multiple scRNA-seq snapshot experiments, for example, along a differentiation trajectory. CAS Google Scholar
Lovatt, D. et al. Transcriptome in vivo analysis (TIVA) of spatially defined single cells in live tissue. Nature Methods11, 190–196 (2014). CASPubMedPubMed Central Google Scholar
Pettit, J. B., Tomer, R., Achim, K., Azizi, L. & Marioni, J. C. Identifying cell types from spatially referenced single-cell expression datasets. PLoS Comput. Biol.10, e1003824 (2014). PubMedPubMed Central Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics26, 139–140 (2010). CASPubMed Google Scholar
Hardcastle, T. J. & Kelly, K. A. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics11, 422 (2010). PubMedPubMed Central Google Scholar
Shalek, A. K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature498, 236–240 (2013). CASPubMedPubMed Central Google Scholar
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res.22, 2008–2017 (2012). CASPubMedPubMed Central Google Scholar
Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods7, 1009–1015 (2010). CASPubMedPubMed Central Google Scholar
Grun, D., Kester, L. & van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nature Methods11, 637–640 (2014). PubMed Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics9, 432–441 (2008). PubMed Google Scholar
Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genet.34, 166–176 (2003). CASPubMed Google Scholar
Liao, J. C. et al. Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl Acad. Sci. USA100, 15522–15527 (2003). CASPubMed Google Scholar
Bansal, M., Belcastro, V., Ambesi-Impiombato, A. & di Bernardo, D. How to infer gene networks from expression profiles. Mol. Syst. Biol.3, 78 (2007). PubMedPubMed Central Google Scholar
Pe'er, D., Regev, A., Elidan, G. & Friedman, N. Inferring subnetworks from perturbed expression profiles. Bioinformatics17 S215–S224 (2001). PubMed Google Scholar
Kim, J. K. & Marioni, J. C. Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol.14, R7 (2013). PubMedPubMed Central Google Scholar
Raj, A., Peskin, C. S., Tranchina, D., Vargas, D. Y. & Tyagi, S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol.4, e309 (2006). PubMedPubMed Central Google Scholar
Kaern, M., Elston, T. C., Blake, W. J. & Collins, J. J. Stochasticity in gene expression: from theories to phenotypes. Nature Rev. Genet.6, 451–464 (2005). CASPubMed Google Scholar
Larson, D. R. What do expression dynamics tell us about the mechanism of transcription? Curr. Opin. Genet. Dev.21, 591–599 (2011). CASPubMedPubMed Central Google Scholar
Schwanhausser, B. et al. Global quantification of mammalian gene expression control. Nature473, 337–342 (2011). PubMed Google Scholar