destiny: diffusion maps for large-scale single-cell data in R (original) (raw)
Journal Article
,
1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
Search for other works by this author on:
,
1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
Search for other works by this author on:
,
1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
Search for other works by this author on:
,
1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
2Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Boltzmannstr. 3, 85748 Garching, Germany
Search for other works by this author on:
,
1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
*To whom correspondence should be addressed.
Search for other works by this author on:
1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and
†Present address: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK.
Search for other works by this author on:
Revision received:
28 October 2015
Accepted:
01 December 2015
Published:
14 December 2015
Cite
Philipp Angerer, Laleh Haghverdi, Maren Büttner, Fabian J. Theis, Carsten Marr, Florian Buettner, destiny: diffusion maps for large-scale single-cell data in R, Bioinformatics, Volume 32, Issue 8, April 2016, Pages 1241–1243, https://doi.org/10.1093/bioinformatics/btv715
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
Summary: Diffusion maps are a spectral method for non-linear dimension reduction and have recently been adapted for the visualization of single-cell expression data. Here we present destiny, an efficient R implementation of the diffusion map algorithm. Our package includes a single-cell specific noise model allowing for missing and censored values. In contrast to previous implementations, we further present an efficient nearest-neighbour approximation that allows for the processing of hundreds of thousands of cells and a functionality for projecting new data on existing diffusion maps. We exemplarily apply destiny to a recent time-resolved mass cytometry dataset of cellular reprogramming.
Availability and implementation: destiny is an open-source R/Bioconductor package “bioconductor.org/packages/destiny” also available at www.helmholtz-muenchen.de/icb/destiny. A detailed vignette describing functions and workflows is provided with the package.
Contact: carsten.marr@helmholtz-muenchen.de or f.buettner@helmholtz-muenchen.de
Supplementary information: Supplementary data are available at Bioinformatics online.
1 Introduction
Recent technological advances allow for the profiling of individual cells, using methods such as single-cell RNA-seq, single-cell RT qPCR or cyTOF (Roditi et al., 2015). These techniques have been used successfully to study stem cell differentiation with time-resolved single-cell experiments, where individual cells are collected at different absolute times within the differentiation process and profiled. While differentiation is a smooth but nonlinear process (Buettner and Theis, 2012; Haghverdi et al., 2015) involving continuous changes of the overall transcriptional state, standard methods for visualizing such data are either based on linear methods such as Principal Component Analysis (see Supplementary Fig. S1) and Independent Components Analysis or they use clustering techniques not accounting for the smooth nature of the data.
In contrast, diffusion maps—initially designed by Coifman et al. (2005) for dimensionality reduction in image processing—recover a distance measure between each pair of data points (cells) in a low dimensional space that is based on the transition probability from one cell to the other through several paths of a random walk. Diffusion maps are especially suited for analysing single-cell gene expression data from differentiation experiments (such as time-course experiments) for three reasons. First, they preserve the global relations between data points. This feature makes it possible to reconstruct developmental traces by re-ordering the asynchronously differentiating cells according to their internal differentiation state. Second, the notion of diffusion distance is robust to noise, which is ubiquitous in single-cell data. Third, by normalizing for sampling density, diffusion maps become insensitive to the distribution of the data points (i.e. sampling density), which aids the detection of rare cell populations.
Here, we present a user friendly R implementation of diffusion maps including previously proposed adaptations to single cell data (Haghverdi et al., 2015) as well as novel functionality. The latter includes approximations allowing for the visualization of large data sets and the projection of new data on existing maps.
2 Description: the destiny package
2.1 Algorithm
As input, destiny accepts an expression matrix or data structure extended with annotation columns. Gene expression data should be pre-processed and normalized using standard workflows (see Supplementary Text S1) before generating the diffusion map. destiny calculates cell-to-cell transition probabilities based on a Gaussian kernel with width σ to create a sparse transition probability matrix M. If the user does not specify σ, destiny employs an estimation heuristic to derive this parameter (see Supplementary Text S2). In contrast to other implementations, destiny allows for the visualization of hundreds of thousands of cells by only using distances to the k nearest neighbors of each cell for the estimation of M (see Supplementary Text S2). Optionally destiny uses an application-specific noise model for censored and missing values in the dataset (see Supplementary Fig. S2). An eigendecomposition is performed on M after density normalization, considering only transition probabilities between different cells. By rotating M, a symmetric adjoint matrix can be used for a faster and more robust eigendecomposition (Coifman et al., 2008). The resulting data-structure contains the eigenvectors with decreasing eigenvalues as numbered diffusion components, the input parameters and a reference to the data.
2.2 Visualization and projection of new data
This data-structure can be easily plotted and colored using the parameters of provided plot methods. An automatic color legend integrated into R’s palette system facilitates the generation of publication-quality plots. A further new feature in destiny is the ability to integrate new experimental data in an already computed diffusion map. destiny provides a projection function to generate the coordinates for the new data without recalculating the diffusion map by computing the transition probabilities from new data points to the existing data points (see Supplementary Text S3).
3 Application
We applied destiny to four single-cell datasets of different size (hundreds to hundreds of thousands of cells) and characteristics (qRT-PCR, RNA-Seq and mass cytometry, see Supplementary Table S1). We first estimate the optimal σ that matches the intrinsic dimensionality of the data (Fig. 1A and Supplementary Figs S3A and Supplementary Data). Using a scree plot (Fig. 1B and Supplementary Figs S3B, Supplementary Data and Supplementary Data), the relevant diffusion components can be identified. However, for big datasets as the mass cytometry data from Zunder et al. (2015) with 256 000 cells and 36 markers, corresponding Eigenvalues decrease smoothly. Although only a part of the intrinsic dimensionality can be represented in a 3D plot, the diffusion map reveals interesting properties of the reprogramming dynamics (Fig. 1C and Supplementary Fig. S6). We compared _destiny_’s performance to other implementations, including our own in MATLAB (based on Maggioni code (http://www.math.duke.edu/∼mauro/code.html), published with Haghverdi et al., 2015) and the diffusionMap R package (Richards, 2014). destiny performs similarly well for small datasets, while outperforming other implementations for large datasets (Supplementary Table S1).
Fig. 1.
destiny applied to the mass cytometry reprogramming dataset of Zunder et al. (2015) with 36 markers and 256 000 cells. (A) The optimal Gaussian kernel width σ. (B) The Eigenvalues of the first 100 diffusion components decrease smoothly, indicating a large intrinsic dimensionality of the data. (C) The initial population of mouse embryonic fibroblasts (MEFs) is reprogrammed and profiled over 20 days. While a final cell population expressing stem cell markers is clearly separated, cells that revert to the MEF state are found proximal to the initial population in the diffusion map. Inset: destiny code to generate the diffusion map
4 Discussion and conclusion
We present a user-friendly R package of the diffusion map algorithm adapted to single-cell gene expression data and include new features for efficient handling of large datasets and a projection functionality for new data. We illustrate the capabilities of our package by visualizing gene expression data of 250 000 cells and show that our package is able to reveal continuous state transitions. Together with an easy to use interface this facilitates the application of diffusion map as new analysis tool for single-cell gene expression data.
Acknowledgement
We thank Chris McGinnis (Seattle, USA), Vicki Moignard (Cambridge, UK), Eli Zunder and Garry Nolan (both Stanford, USA) for helpful comments on destiny.
Funding
Supported by the UK Medical Research Council (Career Development Award to FB), the Bavarian Research Network for Molecular Biosystems (BioSysNet) and the ERC (starting grant LatentCauses to FJT). MB is supported by a DFG Fellowship through the Graduate School of Quantitative Biosciences Munich (QBM).
Conflict of Interest: none declared.
References
(
2012
)
A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst
.
Bioinformatics
,
28
,
i626
–
i632
.
et al. . (
2008
)
Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems
.
Multisc. Model. Simul
.,
7
,
842
–
864
.
et al. . (
2005
)
Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps
.
Proc. Natl. Acad. Sci
.,
102
,
7426
–
7431
.
et al. . (
2015
)
Diffusion maps for high-dimensional single-cell analysis of differentiation data
.
Bioinformatics
.,
31
,
2989
–
2998
.
et al. . (
2015
)
Computational and experimental single cell biology techniques for the definition of cell type heterogeneity, interplay and intracellular dynamics
.
Curr. Opin. Biotechnol
.,
34
,
9
–
15
.
et al. . (
2015
)
A continuous molecular roadmap to iPSC reprogramming through progression analysis of single-cell mass cytometry
.
Cell Stem Cell
,
16
,
323
–
337
.
Author notes
Associate Editor: Ziv Bar-Joseph
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Supplementary data
Citations
Views
Altmetric
Metrics
Total Views 27,164
22,253 Pageviews
4,911 PDF Downloads
Since 11/1/2016
Month: | Total Views: |
---|---|
November 2016 | 33 |
December 2016 | 26 |
January 2017 | 80 |
February 2017 | 87 |
March 2017 | 239 |
April 2017 | 210 |
May 2017 | 358 |
June 2017 | 272 |
July 2017 | 197 |
August 2017 | 220 |
September 2017 | 178 |
October 2017 | 201 |
November 2017 | 155 |
December 2017 | 259 |
January 2018 | 368 |
February 2018 | 347 |
March 2018 | 277 |
April 2018 | 245 |
May 2018 | 306 |
June 2018 | 360 |
July 2018 | 353 |
August 2018 | 287 |
September 2018 | 218 |
October 2018 | 200 |
November 2018 | 227 |
December 2018 | 211 |
January 2019 | 298 |
February 2019 | 185 |
March 2019 | 304 |
April 2019 | 312 |
May 2019 | 310 |
June 2019 | 310 |
July 2019 | 379 |
August 2019 | 313 |
September 2019 | 282 |
October 2019 | 201 |
November 2019 | 246 |
December 2019 | 184 |
January 2020 | 306 |
February 2020 | 242 |
March 2020 | 201 |
April 2020 | 322 |
May 2020 | 224 |
June 2020 | 339 |
July 2020 | 262 |
August 2020 | 233 |
September 2020 | 250 |
October 2020 | 296 |
November 2020 | 300 |
December 2020 | 223 |
January 2021 | 231 |
February 2021 | 260 |
March 2021 | 332 |
April 2021 | 346 |
May 2021 | 277 |
June 2021 | 288 |
July 2021 | 263 |
August 2021 | 290 |
September 2021 | 322 |
October 2021 | 355 |
November 2021 | 331 |
December 2021 | 343 |
January 2022 | 360 |
February 2022 | 247 |
March 2022 | 425 |
April 2022 | 330 |
May 2022 | 385 |
June 2022 | 318 |
July 2022 | 309 |
August 2022 | 328 |
September 2022 | 356 |
October 2022 | 387 |
November 2022 | 346 |
December 2022 | 291 |
January 2023 | 293 |
February 2023 | 333 |
March 2023 | 402 |
April 2023 | 348 |
May 2023 | 307 |
June 2023 | 236 |
July 2023 | 273 |
August 2023 | 222 |
September 2023 | 225 |
October 2023 | 265 |
November 2023 | 272 |
December 2023 | 243 |
January 2024 | 435 |
February 2024 | 528 |
March 2024 | 915 |
April 2024 | 290 |
May 2024 | 330 |
June 2024 | 276 |
July 2024 | 362 |
August 2024 | 267 |
September 2024 | 186 |
Citations
371 Web of Science
×
Email alerts
Citing articles via
More from Oxford Academic