Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq - PubMed (original) (raw)

. 2014 May 15;509(7500):371-5.

doi: 10.1038/nature13173. Epub 2014 Apr 13.

Affiliations

Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq

Barbara Treutlein et al. Nature. 2014.

Abstract

The mammalian lung is a highly branched network in which the distal regions of the bronchial tree transform during development into a densely packed honeycomb of alveolar air sacs that mediate gas exchange. Although this transformation has been studied by marker expression analysis and fate-mapping, the mechanisms that control the progression of lung progenitors along distinct lineages into mature alveolar cell types are still incompletely known, in part because of the limited number of lineage markers and the effects of ensemble averaging in conventional transcriptome analysis experiments on cell populations. Here we show that single-cell transcriptome analysis circumvents these problems and enables direct measurement of the various cell types and hierarchies in the developing lung. We used microfluidic single-cell RNA sequencing (RNA-seq) on 198 individual cells at four different stages encompassing alveolar differentiation to measure the transcriptional states which define the developmental and cellular hierarchy of the distal mouse lung epithelium. We empirically classified cells into distinct groups by using an unbiased genome-wide approach that did not require a priori knowledge of the underlying cell types or the previous purification of cell populations. The results confirmed the basic outlines of the classical model of epithelial cell-type diversity in the distal lung and led to the discovery of many previously unknown cell-type markers, including transcriptional regulators that discriminate between the different populations. We reconstructed the molecular steps during maturation of bipotential progenitors along both alveolar lineages and elucidated the full life cycle of the alveolar type 2 cell lineage. This single-cell genomics approach is applicable to any developing or mature tissue to robustly delineate molecularly distinct cell types, define progenitors and lineage hierarchies, and identify lineage-specific regulatory factors.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: S.R. Quake is a founder and consultant for Fluidigm Corporation.

Figures

Extended Data Figure 1

Extended Data Figure 1. Schematic illustration of the process of sacculation

(a) Schematic illustration of morphological and molecular changes of the distal airways during development. Cell differentiation progresses in a directional manner from the bronchio-alveolar junction (proximal) to the distal tip (distal) of each terminal airway, and therefore progenitor cells persist the longest at the tips. Ciliated (green) and Clara (blue) cells mature first, followed by differentiation of flat alveolar type 1 (AT1, orange) and cuboidal type 2 (AT2, red) cells from cuboidal alveolar progenitors during sacculation (embryonic day (E) 16-18.5), when distal airway tubules widen as nascent AT1 cells flatten to form the gas exchange surface. (b) Micrographs of alveolar (E18.5, post-natal 3 days (PN3d)) and bronchiolar (PN3d) sections of a mouse lung co-stained for Clara (Scgb1a1, green) and ciliated (Foxj1, red) cell markers as well as AT1 (Pdpn, green) and AT2 (Sftpc, red) specific markers. Progenitor cells at the tips of sacculating alveoli are detected by an overlap of AT1 and AT2 specific markers. Newly forming alveolar sacs are marked by asterisks.

Extended Data Figure 2

Extended Data Figure 2. Single cell transcriptomics analysis workflow

(a) Workflow of single cell transcriptomics analysis of mouse lung epithelial cells. A single captured lung epithelial cell stained with Alexa488 for EpCAM (green) is indicated by a red arrow. (b) Single lung epithelial cells captured in microfluidic chips with capture sites designed to trap cells with 10-17 µm (medium, left) or 17-25 µm (large, right) diameter. Cells are stained for viability using Calcein AM. Even cells captured by the large chip did not exceed a diameter of ~15 µm indicating that the medium sized chips are sufficient for comprehensively profiling distal mouse lung epithelial cells.

Extended Data Figure 3

Extended Data Figure 3. Assessment of required sequencing depth, technical and biological variation, dynamic range and reproducibility of single cell RNA-seq data of 80 single distal lung epithelial cells at E18.5

(a) Saturation analysis reveals the sequencing depth required for the detection of most genes expressed by single cells. To detect most expressed genes, single cell RNA-seq libraries have to be sequenced only to a depth of about 1 million reads, whereas libraries of bulk samples have to be sequenced deeper. The number of genes detected in the ensemble of all single cells (synthetic bulk) is comparable to the number of genes detected in the true bulk experiment. Each point on the saturation curve was generated by randomly selecting a number of raw reads from each sample library (bulk: 200 cell bulk library, single cell: single cell RNA-seq libraries of 80 lung epithelial cells, single cell ensemble: bioinformatically pooled single cell libraries) and then using the same alignment pipeline to call genes with mean FPKM >1. Each point represents four replicate sub-samplings, error bars represent standard errors. (b) Technical noise and biological variation in single cell RNA-seq data. Relationship between mean expression level and coefficient of variation for 10,946 genes in single embryonic lung epithelial cells. Several genes exhibit strong biological variation (blue), as they exhibit higher variability than the average noise at a given average gene expression. Housekeeping genes are shown in yellow. (c) Average detected transcript levels (mean FPKM, log2) for 92 ERCC RNA spike-ins as a function of provided number of molecules per lysis reaction for each of the three independent single cell RNA-seq experiments performed at E18.5. Linear regression fits through data points are shown. The length of each ERCC RNA spike-in transcript is encoded in the size and color of the data points. No particular bias towards the detection of shorter versus longer transcripts is observed. The method shows single transcript sensitivity as well as a dynamic range of approximately 6 orders of magnitude, in agreement with a previous study evaluating microfluidic single cell RNA-seq. (d,e) Correlation between (d) transcript levels of a 200-cell population and median transcript levels of single cells of the same pool of embryonic lungs, and (e) transcript levels of two single AT2 cells. r, Pearson correlation coefficients. (f,g) Correlation between (f) transcript levels of all genes detected in the single lung and the pooled lung experiment and between (g) transcript levels of all genes detected in the two independent experiments on pooled embryonic lungs. Pearson correlation coefficients r are given.

Extended Data Figure 4

Extended Data Figure 4. Lineage-specific genes identified by single cell transcriptome analysis allow functional description of individual distal lung epithelial cell populations

(a) Results of gene ontology (GO) and KEGG pathway enrichment analyses for distal lung epithelial cell types based on lineage-specific genes identified by single cell RNA-seq of 80 E18.5 distal lung epithelial cells (Supplementary Data 5). (b,c) Correlograms visualizing correlation of single cell gene expression profiles between (b) transcription factors or (c) receptors/ligands and the major canonical marker genes for bronchiolar and alveolar lineages (AT1: Pdpn, AT2: Sftpc, Clara: Scgb1a1, ciliated: Foxj1). The color bar denotes the Pearson correlation coefficient from −1 (blue, anticorrelated genes) to 1 (green, positively correlated genes). (d) Validation of novel marker genes by single cell multiplexed qPCR on 74 single cells isolated from the distal mouse lung epithelium at E18.5. Lineage-specific expression of seven new marker genes is shown by clustering with known markers for respective lineages (AT2, red, novel: Cftr, Cebpa, Sftpd, Id2), (AT1, orange, novel: Vegfa), (ciliated, green, novel: Itgb4, Top2a), (Clara, blue). (e) Validation of Hopx expression in AT1 cells. A lung section from a transgenic Hopx > GFP adult mouse (Hopx-Cre-ERT2+/−; mTmG+/tg) was co-stained for AT1 marker Pdpn. Maximum intensity projections of confocal z-stacks show that AT1 cells expressing the membrane-localized GFP reporter (green) also express Pdpn (white). Scale bar 50 µm. (f) Hierarchical clustering of 46 transgenically labeled mature Sftpc+ AT2 cells, isolated by FACS from adult mouse lung. Most genes identified as AT2 lineage-specific from single cell transcriptomes at E18.5 are transcribed also by mature AT2 cells. In contrast, no or low expression is observed in mature AT2 cells for the genes specific to the other alveolar or bronchiolar lineages as identified from single cell RNA-seq data at E18.5.

Extended Data Figure 5

Extended Data Figure 5. Molecular profiles distinguish developmental intermediates during the differentiation of AT1 and AT2 cells from a common bipotential progenitor

(a) Hierarchical clustering of multiplexed qPCR gene expression data for 33 single cells from E16.5 lung epithelium (CD45−/EpCAM+) suggests the presence at this time point of two major cell lineages, bronchiolar (cyan) and alveolar (brown) progenitors. Note that alveolar progenitors express a subset of both AT1 and AT2 marker genes. (b) PCA of multiplexed qPCR data of lung epithelial cells at E16.5 identifies two gene groups in contrast to three observed at E18.5 (figure 1c). AT1 and AT2 specific marker genes do not segregate into distinct populations at E16.5. (c) Hierarchical clustering of multiplexed qPCR gene expression data for 74 single embryonic lung epithelial cells (CD45−/EpCAM+) at E18.5 shows multiple distinct cell populations consistent with RNA-sequencing data at this time point: BP, AT1, AT2, Clara and ciliated cells. Each row represents a single cell and each column a gene. Cells are clustered based on expression of marker genes for alveolar and bronchiolar lineages (AT2: Abca3, Sftpb, Muc1, Lyz2, Sftpc; AT1: Aqp5, Pdpn, Ager; ciliated: Foxj1; Clara: Scgb1a1). (d) PCA of multiplexed qPCR data replicates gene families found by single cell RNA-seq at E18.5. Gene groups were characterized based on differential correlation with the first two principal components. (e) Developmental sequence of AT1 (orange) and AT2 (red) specification from a common BP (brown). Two and three maturation intermediates were identified in the specification process of AT2 and AT1 cell types, respectively, based on the expression of known and novel marker genes for both alveolar lineages measured by single cell RNA-seq (Figure 3). Transcription factors and receptors/ligands shown here were found to be expressed in BP cells and subsequently restricted to one of the alveolar lineages. Arrows, differentiation pathway; gray braces, change in transcript level of respective genes with tip pointing towards lower expression. (f-i) Protein level heterogeneity of alveolar epithelial markers during sacculation. (f) Immunofluorescent micrograph from an E19.5 lung with mature AT1 and AT2 cells stained for their respective markers (Pdpn (white) and Ager (red) for AT1, Sftpc (green) for AT2). BPs are positive for all three markers. Cells in intermediate states are observed, such as early AT1 (Pdpn and Ager positive, Sftpc low) and early AT2 cells (Sftpc positive, and either Pdpn positve/Ager low or Pdpn low/Ager negative). Scale bar, 10µm. (g) Markers of late AT2 cells are heterogeneously expressed at E18.5. Immunofluorescent micrograph of a lung from a Lyz2-eGFP transgenic mouse, in which within the epithelium (E-Cadherin, blue) only a subset of Sftpc (green) positive AT2 cells are Lyz2 (red) positive. Scale bar, 20 µm. (h) Immunofluorescent staining of E18.5 lung tissue for Lamp3 (red) shows heterogeneous expression of Lamp3 in Sftpc-positve cells (green): Proximal cells show higher Lamp3 expression than distal cells. Blue, DAPI-stained nuclei, scale bar, 20 µm. (i) Immunofluorescent staining of E18.5 lung tissue for S100a6 (red) shows heterogeneous expression of the secreted protein S100a6 in Pdpn-positve cells (green). Blue, DAPI-stained nuclei, scale bar, 20 µm.

Extended Data Figure 6

Extended Data Figure 6. Following _Sftpc_-expressing cells throughout their lifecycle

(a) Whole mount in situ hybridizations of embryonic mouse lungs at E11.5, E13.5 and E14.5 using probes against Sftpc mRNA show expression of Sftpc specific to the tips of the epithelial tree branches. Moreover, variations in signal intensity indicate heterogeneity in the level of Sftpc expression across cells, which is in agreement with our single cell RNA-seq data of Sftpc+ cells at E14.5 (see Figure 4a). (b) Schematic of the different transcriptional states in the specification of an AT2 cell as identified by single cell RNA-seq of Sftpc+ cells from distal mouse lung epithelium of embryonic (E14.5, E16.5, E18.5) and adult mice. The cell transitions from an early (A) and late (B) early progenitor state into a bipotential progenitor state before either taking the AT1 fate (nascent AT1), or following along the AT2 pathway to become a nascent and finally a mature AT2 cell. Groups of genes turning on/up or off/down during the individual transitions are shown above and below each arrow, respectively (Figure 4a and Supplementary Data 6). Whereas EP and BP cells are double positive for Sftpc and Pdpn, nascent and mature AT2 cells express Sftpc but turn off expression of the AT1 marker Pdpn. The developmental time points at which the individual cell states were detected, and putative locations are shown.

Extended Data Figure 7

Extended Data Figure 7. The number of unique genes and the total number of transcripts expressed by a single cell strongly correlates with its differentiation state

(a) Saturation analysis of single cell RNA-seq data of lung epithelial cells at different embryonic and adult time points (E14.5, E18.5, adult AT2) reveals that the number of unique genes expressed by single lung epithelial cells decreases with progressing differentiation state. Distal lung epithelial cells at E14.5 express over 6000 genes, whereas cells at E18.5 express approximately 3000 and mature AT2 cells only around 2000 genes. Each point on the saturation curve was generated by randomly selecting a number of raw reads from each sample library and then using the same alignment pipeline to call genes with mean FPKM >1. Each point represents four replicate sub-samplings. Error bars represent standard errors. All libraries were sequenced to a depth of at least 2 million reads. (b) Single cell RNA-seq reveals that the total number of transcripts expressed by single cells decreases with increasing differentiation state of the cell. Number of transcripts per cell were calculated from the FPKM values of all genes in each cell using the correlation between number of transcripts of exogenous spike-in mRNA sequences and their respective measured mean FPKM values (exemplary calibration curves are shown in Extended Data Figure 3c for three replicates at E18.5). Area normalized density distributions are shown for embryonic cells at E14.5 (45 cells), E16.5 (27 cells), E18.5 (80 cells) and for 46 Sftpc+ adult AT2 cells. The number of transcripts is highest in lung epithelial progenitor cells at E16.5 and E14.5 and decreases in cells at E18.5 and even further in mature AT2 cells. Note that single cell RNA-seq libraries for E14.5, E18.5 and adult AT2 cells were sequenced to a depth of 2-6 million reads, whereas the libraries for cells at E16.5 were sequenced to a lower depth of 100,000-550,000 reads. (c) Calibration of Ct values measured by single cell qPCR to number of molecules. Average detected transcript levels (log2Ex = CtLoD – Ct, CtLoD = 22) for 6 ERCC RNA spike-ins as a function of provided number of molecules per lysis reaction for each of three independent single cell qPCR experiments performed on embryonic (E16.5, 2 replicates, red and green) and adult mouse lung (adult AT2, 1 replicate, blue). Linear regression fits through data points and corresponding equations are shown and were used to convert CT values measured by qPCR into numbers of transcripts. (d) Single cell qPCR confirms the presence of a higher number of transcripts in lung epithelial progenitor cells as compared to fully differentiated alveolar epithelial cells. The median number of transcripts per cell as detected by single cell RNA-seq (y-axis) and by single cell multiplexed qPCR of 90 genes (x-axis) is shown for distal lung epithelial cells at E16.5 (qPCR: 33 cells, RNA-seq: 27 cells) and mature AT2 cells (qPCR: 48 cells, RNA-seq: 46 cells).

Extended Data Figure 8

Extended Data Figure 8. Transcriptional states during the early lifetime of the Clara cell lineage identified by single cell RNA-seq of Scgb3a2+ cells at E14.5, E16.5 and E18.5

(a) Hierarchical clustering of 24 Scgb3a2 positive cells from distal mouse lung epithelium at different embryonic time points (E14.5, E16.5, E18.5) based on the genes with highest PC loadings in an unbiased PCA analysis of all cells and all genes (panel c). Cells are shown in rows and genes are shown in columns. Cells cluster into 3 major groups. Scgb3a2 and Scgb1a1 transcript levels are shown in side-bars on the right. Whereas canonical Clara cell marker Scgb1a1 is first detected at E18.5, Scgb3a2 is detected as early as E14.5 suggesting to be an early Clara cell marker. (b) Gene Ontology (GO) enrichments of the three different gene clusters as well as transcription factors (TF) belonging to the different groups of genes. (c) PCA analysis of all Scgb3a2 positive cells and all genes identifies three different cell populations that were identified as bronchiolar progenitor as well as Clara and ciliated cells.

Figure 1

Figure 1. Single cell RNA-seq of 80 embryonic (E18.5) mouse lung epithelial cells enables unbiased identification of alveolar, bronchiolar and progenitor cell populations

(a) Spatially heterogeneous differentiation of distal lung epithelium. Micrograph of a newly forming alveolar sac (asterisk) and schematic below illustrate cell types and gradient of developmental intermediates comprising the distal lung epithelium during sacculation (E18.5). Micrograph: green, Pdpn, alveolar type 1 (AT1) marker; red, Sftpc, AT2 marker; white, E-Cadherin (Ecad), pan-epithelial marker). Bipotential progenitor cells (BP) are characterized by co-expression of AT1 and AT2 markers. Schematic: BPs (brown) persist at the tip, nascent AT2 (red) and AT1 (orange) cells are located more proximally. Ciliated (green) and Clara (blue) cells are located in the bronchiolar epithelium (not labeled in micrograph). Scale bar 75 µm. (b) Principal component analysis (PCA) of 80 single cell transcriptomes (3 biological replicates) at E18.5 distinguishes major bronchiolar and alveolar cell lineages. (c) Distinct gene groups characterize each cell population based on differential correlation with PC1 and PC3. Arrow tip denotes correlation coefficient of the respective gene with each PC.

Figure 2

Figure 2. Single cell transcriptome analysis discovers novel markers

(a) Hierarchical clustering of RNA-seq data from 80 single distal lung epithelial cells (E18.5, 3 biological replicates) identifies five molecularly distinct populations, assigned to alveolar and bronchiolar lineages based on the presence of canonical marker genes (asterisks) within the respective gene clusters (AT2 (red): Sftpb, Sftpc; AT1 (orange): Pdpn; ciliated (green): Foxj1; Clara (blue): Scgb1a1). BPs (brown) co-express AT1 and AT2 markers. Each row represents a single cell, each column a gene (104 genes in total, Supplementary Data 3). Permutation analysis supports the significance of the presented clustering (p-value = 2.89×10−122, Methods). (b) Bargraphs showing the top 30 putative marker genes for each cell lineage inferred from the E18.5 single cell transcriptomes as a function of the multiple testing corrected p-value for each gene (Guilt-by-Association, Methods). Canonical markers, bold and colored. (c) Validation of Hopx expression in AT1 cells. A lung section from a transgenic Hopx-Cre-ERT2+/−;mTmG+/tg adult mouse was co-stained for AT1 marker Ager. Maximum intensity projections of confocal z-stacks show that AT1 cells expressing membrane-localized GFP (green) also express Ager (white). Scale bar 50 µm. (d) Validation of Egfl6 expression in AT2 cells. Multiplexed in-situ hybridization of E18.5 lungs shows co-localization of probes targeting Egfl6 (green) and AT2 marker Sftpc (red) mRNA. Inset, close up of boxed region. Blue, DAPI-stained nuclei. Scale bar 50 µm. (e) Validation of Krt15 expression in Clara cells. Immunofluorescent staining of E18.5 lungs using antibodies against Krt15 (red) and Clara cell marker Scgb1a1 (green). Blue, DAPI-stained nuclei. Scale bar, 50µm.

Figure 3

Figure 3. Molecular profiles distinguish developmental intermediates during the differentiation of AT1 and AT2 cells from a common bipotential progenitor

Developmental sequence of AT1 (orange) and AT2 (red) specification from a common BP (brown). Two and three maturation intermediates were identified in the specification process of AT2 and AT1 cell types, respectively, based on expression of known and novel marker genes for both alveolar lineages measured by single cell RNA-seq. Genes were grouped into early and late markers of either lineage. Arrows, differentiation pathway; gray braces, change in transcript level of respective genes, tip pointing towards lower expression.

Figure 4

Figure 4. Single cell RNA-seq of Sftpc+ cells at E14.5, E16.5, E18.5 and in the adult mouse lung elucidates progressive transcriptional states of the AT2 cell lineage throughout its lifecycle

(a) Hierarchical clustering of 124 Sftpc+ cells from distal mouse lung epithelium of embryonic (E14.5, E16.5, E18.5) and adult mice based on genes with highest PC loadings (Supplementary Data 6) in an unbiased PCA analysis (panel b) of all cells and genes. Single cells are shown in rows, genes are shown in columns. Right side-bars show Sftpc and Pdpn expression, as well as the number of genes expressed by each single cell (see also Extended Data Figure 7). Functional gene ontology enrichments and transcription factors (TFs) specific to each gene group (bottom grey-shaded bars) are shown (Supplementary Data 6). A similar analysis following the Clara cell lineage throughout development is shown in Extended Data Figure 8. (b) PCA of single cell transcriptomes based on genes detected in more than two cells. Cells cluster into three major populations based on different scores along the first two principal components. (c) Violin plots depicting the course of expression of each of seven distinct gene groups across the 6 cell populations. Each violin plot shows the frequency distribution of the mean transcript level (log2-transformed FPKM) of all genes per cell.

Comment in

Similar articles

Cited by

References

    1. Kim CFB, et al. Identification of bronchioalveolar stem cells in normal lung and lung cancer. Cell. 2005;121:823–835. - PubMed
    1. Zemke AC, et al. Molecular staging of epithelial maturation using secretory cell-specific genes as markers. Am J Respir Cell Mol Biol. 2009;40:340–348. - PMC - PubMed
    1. Guha A, et al. Neuroepithelial body microenvironment is a niche for a distinct subset of Clara-like precursors in the developing airways. Proceedings of the National Academy of Sciences. 2012;109:12592–12597. - PMC - PubMed
    1. Gonzalez R, et al. Freshly isolated rat alveolar type I cells, type II cells, and cultured type II cells have distinct molecular phenotypes. Am. J. Physiol. Lung Cell Mol. Physiol. 2005;288:L179–L189. - PubMed
    1. Xu Y, et al. Transcriptional Programs Controlling Perinatal Lung Maturation. PLoS ONE. 2012;7:e37046. - PMC - PubMed

Methods References

    1. Dalerba P, et al. Single-cell dissection of transcriptional heterogeneity in human colon tumors. Nat. Biotechnol. 2011;29:1120–1127. - PMC - PubMed
    1. Babraham Institute, Babraham Bioinformatics. FASTQC. bioinformatics.bbsrc.ac.uk. at < http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc>.
    1. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. 2011. 2011;17
    1. Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011;27:863–864. - PMC - PubMed
    1. Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources