destiny: diffusion maps for large-scale single-cell data in R (original) (raw)

Journal Article

,

1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and

Search for other works by this author on:

,

1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and

Search for other works by this author on:

,

1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and

Search for other works by this author on:

,

1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and

2Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Boltzmannstr. 3, 85748 Garching, Germany

Search for other works by this author on:

,

1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and

*To whom correspondence should be addressed.

Search for other works by this author on:

1Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany and

†Present address: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK.

Search for other works by this author on:

Revision received:

28 October 2015

Accepted:

01 December 2015

Published:

14 December 2015

Cite

Philipp Angerer, Laleh Haghverdi, Maren Büttner, Fabian J. Theis, Carsten Marr, Florian Buettner, destiny: diffusion maps for large-scale single-cell data in R, Bioinformatics, Volume 32, Issue 8, April 2016, Pages 1241–1243, https://doi.org/10.1093/bioinformatics/btv715
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Summary: Diffusion maps are a spectral method for non-linear dimension reduction and have recently been adapted for the visualization of single-cell expression data. Here we present destiny, an efficient R implementation of the diffusion map algorithm. Our package includes a single-cell specific noise model allowing for missing and censored values. In contrast to previous implementations, we further present an efficient nearest-neighbour approximation that allows for the processing of hundreds of thousands of cells and a functionality for projecting new data on existing diffusion maps. We exemplarily apply destiny to a recent time-resolved mass cytometry dataset of cellular reprogramming.

Availability and implementation: destiny is an open-source R/Bioconductor package “bioconductor.org/packages/destiny” also available at www.helmholtz-muenchen.de/icb/destiny. A detailed vignette describing functions and workflows is provided with the package.

Contact: carsten.marr@helmholtz-muenchen.de or f.buettner@helmholtz-muenchen.de

Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction

Recent technological advances allow for the profiling of individual cells, using methods such as single-cell RNA-seq, single-cell RT qPCR or cyTOF (Roditi et al., 2015). These techniques have been used successfully to study stem cell differentiation with time-resolved single-cell experiments, where individual cells are collected at different absolute times within the differentiation process and profiled. While differentiation is a smooth but nonlinear process (Buettner and Theis, 2012; Haghverdi et al., 2015) involving continuous changes of the overall transcriptional state, standard methods for visualizing such data are either based on linear methods such as Principal Component Analysis (see Supplementary Fig. S1) and Independent Components Analysis or they use clustering techniques not accounting for the smooth nature of the data.

In contrast, diffusion maps—initially designed by Coifman et al. (2005) for dimensionality reduction in image processing—recover a distance measure between each pair of data points (cells) in a low dimensional space that is based on the transition probability from one cell to the other through several paths of a random walk. Diffusion maps are especially suited for analysing single-cell gene expression data from differentiation experiments (such as time-course experiments) for three reasons. First, they preserve the global relations between data points. This feature makes it possible to reconstruct developmental traces by re-ordering the asynchronously differentiating cells according to their internal differentiation state. Second, the notion of diffusion distance is robust to noise, which is ubiquitous in single-cell data. Third, by normalizing for sampling density, diffusion maps become insensitive to the distribution of the data points (i.e. sampling density), which aids the detection of rare cell populations.

Here, we present a user friendly R implementation of diffusion maps including previously proposed adaptations to single cell data (Haghverdi et al., 2015) as well as novel functionality. The latter includes approximations allowing for the visualization of large data sets and the projection of new data on existing maps.

2 Description: the destiny package

2.1 Algorithm

As input, destiny accepts an expression matrix or data structure extended with annotation columns. Gene expression data should be pre-processed and normalized using standard workflows (see Supplementary Text S1) before generating the diffusion map. destiny calculates cell-to-cell transition probabilities based on a Gaussian kernel with width σ to create a sparse transition probability matrix M. If the user does not specify σ, destiny employs an estimation heuristic to derive this parameter (see Supplementary Text S2). In contrast to other implementations, destiny allows for the visualization of hundreds of thousands of cells by only using distances to the k nearest neighbors of each cell for the estimation of M (see Supplementary Text S2). Optionally destiny uses an application-specific noise model for censored and missing values in the dataset (see Supplementary Fig. S2). An eigendecomposition is performed on M after density normalization, considering only transition probabilities between different cells. By rotating M, a symmetric adjoint matrix can be used for a faster and more robust eigendecomposition (Coifman et al., 2008). The resulting data-structure contains the eigenvectors with decreasing eigenvalues as numbered diffusion components, the input parameters and a reference to the data.

2.2 Visualization and projection of new data

This data-structure can be easily plotted and colored using the parameters of provided plot methods. An automatic color legend integrated into R’s palette system facilitates the generation of publication-quality plots. A further new feature in destiny is the ability to integrate new experimental data in an already computed diffusion map. destiny provides a projection function to generate the coordinates for the new data without recalculating the diffusion map by computing the transition probabilities from new data points to the existing data points (see Supplementary Text S3).

3 Application

We applied destiny to four single-cell datasets of different size (hundreds to hundreds of thousands of cells) and characteristics (qRT-PCR, RNA-Seq and mass cytometry, see Supplementary Table S1). We first estimate the optimal σ that matches the intrinsic dimensionality of the data (Fig. 1A and Supplementary Figs S3A and Supplementary Data). Using a scree plot (Fig. 1B and Supplementary Figs S3B, Supplementary Data and Supplementary Data), the relevant diffusion components can be identified. However, for big datasets as the mass cytometry data from Zunder et al. (2015) with 256 000 cells and 36 markers, corresponding Eigenvalues decrease smoothly. Although only a part of the intrinsic dimensionality can be represented in a 3D plot, the diffusion map reveals interesting properties of the reprogramming dynamics (Fig. 1C and Supplementary Fig. S6). We compared _destiny_’s performance to other implementations, including our own in MATLAB (based on Maggioni code (http://www.math.duke.edu/∼mauro/code.html), published with Haghverdi et al., 2015) and the diffusionMap R package (Richards, 2014). destiny performs similarly well for small datasets, while outperforming other implementations for large datasets (Supplementary Table S1).

destiny applied to the mass cytometry reprogramming dataset of Zunder et al. (2015) with 36 markers and 256 000 cells. (A) The optimal Gaussian kernel width σ. (B) The Eigenvalues of the first 100 diffusion components decrease smoothly, indicating a large intrinsic dimensionality of the data. (C) The initial population of mouse embryonic fibroblasts (MEFs) is reprogrammed and profiled over 20 days. While a final cell population expressing stem cell markers is clearly separated, cells that revert to the MEF state are found proximal to the initial population in the diffusion map. Inset: destiny code to generate the diffusion map

Fig. 1.

destiny applied to the mass cytometry reprogramming dataset of Zunder et al. (2015) with 36 markers and 256 000 cells. (A) The optimal Gaussian kernel width σ. (B) The Eigenvalues of the first 100 diffusion components decrease smoothly, indicating a large intrinsic dimensionality of the data. (C) The initial population of mouse embryonic fibroblasts (MEFs) is reprogrammed and profiled over 20 days. While a final cell population expressing stem cell markers is clearly separated, cells that revert to the MEF state are found proximal to the initial population in the diffusion map. Inset: destiny code to generate the diffusion map

4 Discussion and conclusion

We present a user-friendly R package of the diffusion map algorithm adapted to single-cell gene expression data and include new features for efficient handling of large datasets and a projection functionality for new data. We illustrate the capabilities of our package by visualizing gene expression data of 250 000 cells and show that our package is able to reveal continuous state transitions. Together with an easy to use interface this facilitates the application of diffusion map as new analysis tool for single-cell gene expression data.

Acknowledgement

We thank Chris McGinnis (Seattle, USA), Vicki Moignard (Cambridge, UK), Eli Zunder and Garry Nolan (both Stanford, USA) for helpful comments on destiny.

Funding

Supported by the UK Medical Research Council (Career Development Award to FB), the Bavarian Research Network for Molecular Biosystems (BioSysNet) and the ERC (starting grant LatentCauses to FJT). MB is supported by a DFG Fellowship through the Graduate School of Quantitative Biosciences Munich (QBM).

Conflict of Interest: none declared.

References

(

2012

)

A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst

.

Bioinformatics

,

28

,

i626

i632

.

et al. . (

2008

)

Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems

.

Multisc. Model. Simul

.,

7

,

842

864

.

et al. . (

2005

)

Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps

.

Proc. Natl. Acad. Sci

.,

102

,

7426

7431

.

et al. . (

2015

)

Diffusion maps for high-dimensional single-cell analysis of differentiation data

.

Bioinformatics

.,

31

,

2989

2998

.

et al. . (

2015

)

Computational and experimental single cell biology techniques for the definition of cell type heterogeneity, interplay and intracellular dynamics

.

Curr. Opin. Biotechnol

.,

34

,

9

15

.

et al. . (

2015

)

A continuous molecular roadmap to iPSC reprogramming through progression analysis of single-cell mass cytometry

.

Cell Stem Cell

,

16

,

323

337

.

Author notes

Associate Editor: Ziv Bar-Joseph

© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 27,164

22,253 Pageviews

4,911 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 33
December 2016 26
January 2017 80
February 2017 87
March 2017 239
April 2017 210
May 2017 358
June 2017 272
July 2017 197
August 2017 220
September 2017 178
October 2017 201
November 2017 155
December 2017 259
January 2018 368
February 2018 347
March 2018 277
April 2018 245
May 2018 306
June 2018 360
July 2018 353
August 2018 287
September 2018 218
October 2018 200
November 2018 227
December 2018 211
January 2019 298
February 2019 185
March 2019 304
April 2019 312
May 2019 310
June 2019 310
July 2019 379
August 2019 313
September 2019 282
October 2019 201
November 2019 246
December 2019 184
January 2020 306
February 2020 242
March 2020 201
April 2020 322
May 2020 224
June 2020 339
July 2020 262
August 2020 233
September 2020 250
October 2020 296
November 2020 300
December 2020 223
January 2021 231
February 2021 260
March 2021 332
April 2021 346
May 2021 277
June 2021 288
July 2021 263
August 2021 290
September 2021 322
October 2021 355
November 2021 331
December 2021 343
January 2022 360
February 2022 247
March 2022 425
April 2022 330
May 2022 385
June 2022 318
July 2022 309
August 2022 328
September 2022 356
October 2022 387
November 2022 346
December 2022 291
January 2023 293
February 2023 333
March 2023 402
April 2023 348
May 2023 307
June 2023 236
July 2023 273
August 2023 222
September 2023 225
October 2023 265
November 2023 272
December 2023 243
January 2024 435
February 2024 528
March 2024 915
April 2024 290
May 2024 330
June 2024 276
July 2024 362
August 2024 267
September 2024 186

Citations

371 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic