Single cell RNA sequencing of 13 human tissues identify cell types and receptors of human coronaviruses (original) (raw)

Abstract

The new coronavirus (SARS-CoV-2) outbreak from December 2019 in Wuhan, Hubei, China, has been declared a global public health emergency. Angiotensin I converting enzyme 2 (ACE2), is the host receptor by SARS-CoV-2 to infect human cells. Although ACE2 is reported to be expressed in lung, liver, stomach, ileum, kidney and colon, its expressing levels are rather low, especially in the lung. SARS-CoV-2 may use co-receptors/auxiliary proteins as ACE2 partner to facilitate the virus entry. To identify the potential candidates, we explored the single cell gene expression atlas including 119 cell types of 13 human tissues and analyzed the single cell co-expression spectrum of 51 reported RNA virus receptors and 400 other membrane proteins. Consistent with other recent reports, we confirmed that ACE2 was mainly expressed in lung AT2, liver cholangiocyte, colon colonocytes, esophagus keratinocytes, ileum ECs, rectum ECs, stomach epithelial cells, and kidney proximal tubules. Intriguingly, we found that the candidate co-receptors, manifesting the most similar expression patterns with ACE2 across 13 human tissues, are all peptidases, including ANPEP, DPP4 and ENPEP. Among them, ANPEP and DPP4 are the known receptors for human CoVs, suggesting ENPEP as another potential receptor for human CoVs. We also conducted “CellPhoneDB” analysis to understand the cell crosstalk between CoV-targets and their surrounding cells across different tissues. We found that macrophages frequently communicate with the CoVs targets through chemokine and phagocytosis signaling, highlighting the importance of tissue macrophages in immune defense and immune pathogenesis.

Keywords: Coronaviruses, SARS-CoV-2, scRNA-seq, ACE2, Co-receptor, Macrophage

1. Introduction

In December 2019, a novel coronavirus (SARS-CoV-2) infection emerged in Wuhan. Over 80 thousand of people are infected with SARS-CoV-2 until March 6, showing that SARS-CoV-2 is highly contagious. Coronavirus is a type of single-stranded RNA (ssRNA) virus [1], including the well-known Middle East respiratory syndrome coronavirus (MERS-CoV) and severe acute respiratory syndrome coronavirus (SARS-CoV). The symptoms caused by SARS-CoV-2 infection include acute respiratory distress syndrome (∼29%), acute cardiac injury (∼12%) or acute kidney injury (∼7%) [2], implying that SARS-CoV-2 may infect various human tissues.

Viruses bind to host receptors on target cell surface to establish infection. Membrane proteins mediated membrane fusion allowed the entry of enveloped viruses [3]. As recently reported, both SARS-CoV-2 and SARS-CoV could use ACE2 protein to gain entry into the cells [4,5]. Since the outbreak, many data analysis have showed a wide distribution of ACE2 across human tissues, including lung, liver, stomach, ileum, colon and kidney [6], indicating that SARS-CoV-2 may infect multiple organs. However, these data showed that AT2 cell (the main target cell of SARS-CoV-2) in the lung actually expressed rather low levels of ACE2 [6]. Hence, the SARS-CoV-2 may depends on co-receptor or other auxiliary membrane proteins to facilitate its infection. It is reported that viruses tend to hijack co-expressed proteins as their host factors [7]. For example, Hoffmann et al. recently showed that SARS-CoV-2-S uses ACE2 for entry and depends on the cellular protease TMPRSS2 for priming [5], showing that SARS-CoV-2 infections also require multiple factors. Understanding the receptors usage by the viruses could facilitate the development of intervention strategies. Therefore, identifying the potential co-receptors or auxiliary membrane proteins for SARS-CoV-2 is of great significance.

For this purpose, we collected single cell gene expression matrices from 13 relatively normal human tissues, consisting of lung [8], liver [9], ileum [10], rectum [10], blood [11], bone marrow [12], skin [13], spleen [14], esophagus [14], colon [15], eye [16], stomach [17] and kidney [18] from published literatures. We analyzed the single cell co-expression profiles of 51 known ssRNA viral receptors and 400 membrane proteins, including ACE2, in the identified 119 cell types across the 13 human tissues. After that, we conducted “CellPhoneDB” to identify immune cells frequently crosstalk with CoVs-target cells, in multiple tissues.

2. Materials and Methods

2.1. Data collection

The gene raw counts or normalized gene expression matrix for each single cell were downloaded from GEO (https://www.ncbi.nlm.nih.gov/geo/) or Human Cell Atlas (https://www.humancellatlas.org) database (Table S1). In total, we collected single cell gene expression data of 13 tissues, including liver, lung, colon, ileum, rectum, blood, spleen, bone marrow, eye, skin, stomach, oesophagus and kidney. The data source and the sample information are listed as follows. Liver, GEO Accession No. GSE115469, 5 normal human donors; Lung, GEO Accession No. GSE130148, 4 human donors died from hypoxic brain damage; Colon, GEO Accession No. GSE116222, 3 healthy volunteers; Ileum and rectum, GEO Accession No. GSE125970, totally 4 intestine mucosae sampled at least 10 cm away from the tumor border; Skin, GEO Accession No. GSE132802, 4 healthy volunteers; PBMC, GEO Accession No. GSE136103, 4 samples from cirrhotic patients; Spleen and oesophagus, available from Human Cell Atlas, totally 11 cardiac death donors; Bone marrow, GEO Accession No. GSE120221, 5 healthy donors (A, E, J, R, U); Eye, GEO Accession No. GSE135922, 3 Macula and 3 periphery of human donor eyes; Stomach, GEO Accession No. GSE134520, 3 Non-atrophic gastritis patients; Kidney, GEO Accession No. GSE131685, 3 normal kidney tissues obtained at least 2 cm away from tumor tissue.

The high-quality virus-host receptor interactions were downloaded from Viral Receptor database (http://www.computationalbiology.cn:5000/viralReceptor), which curated 152 pairs of mammalian virus-host receptor interactions and 51 virus receptors from 9 mammal species. The membrane proteins were extracted from Membranome database (https://membranome.org).

2.2. Data processing, quality control and normalization

The raw count matrix (UMI counts per gene per cell) was processed by Seurat [19]. Cells with less than 100 expressed genes (UMI count > 0) and higher than 25% mitochondrial genome transcript were removed. Genes expressed in less than three cells were removed. Then, we normalized the gene expression data using “NormalizeData” function with default settings. The sources of cell-cell variation driven by batch were regressed out using the number of detected UMI and mitochondrial gene expression, which was implemented by ‘‘ScaleData’’ function. The corrected expression matrix was used for cell clustering and dimensional reduction.

2.3. Cell clustering, dimensional reduction and visualization

The cell clustering and dimensional reduction were performed by Seurat package. Before that, we choose 2000 highly variable genes (HVGs) from the corrected expression matrix and then centered and scaled them. It was implemented by ‘‘FindVariableGenes’’ function in the Seurat package. We then performed principle component analysis (PCA) on the HVGs using ‘‘RunPCA’’ function. To remove the signal-to-noise ratio, we select a number of significant principal components by implementing “JackStraw” function, which was implemented by permutation test. Specifically, we firstly identified 50 principal components as a result and then selected the significant components according to the p-values produced by “ScoreJackStraw” function for further analysis. The batch effects were removed by harmony package [20].

Cells were then clustered utilizing the ‘‘FindClusters’’ function through embedding cells into a graph structure in PCA space. We set the parameter resolution as 0.8 to identify only major cell types, e.g. T cells, B cells or macrophages. The clustered cells were then projected onto a two-dimensional space using “RunUMAP” function. The clustering results were visualized by “DimPlot” function.

2.4. Cell type identification

To annotate cell clusters, we firstly identified the differentially expressed genes on each cluster by performing “FindMarkers” function. The cell clusters were then annotated according to curated known cell markers (Fig. S1). The cell clusters consistently expressed the same cell marker were merged.

2.5. Cell-cell interaction analysis

We conducted cell-cell interaction analysis utilizing cellphonedb function curated by CellPhoneDB database [21]. The significant cell-cell interactions were selected with p-value < 0.01.

3. Results

3.1. Cell type identification in 13 human tissues

We collected the single cell RNA sequencing data (raw count gene expression matrix or normalized gene expression matrix) from published literatures, which have been deposited in public database, e.g. GEO (https://www.ncbi.nlm.nih.gov/geo/) or Human Cell Atlas (https://www.humancellatlas.org). Totally, we curated single cell gene expression matrices of 13 human tissues, including lung [8], liver [9], ileum [10], rectum [10], blood [11], bone marrow [12], skin [13], spleen [14], esophagus [14], colon [15], eye [16], stomach [17] and kidney [18] (Table S1). For each tissue, we performed cell clustering and dimension reduction on the scaled gene expression matrix using Seurat package [19]. After filtering out low quality cells, we obtained 8443, 43,474, 4248, 5282, 3279, 30,693, 97,695, 17,131, 4335, 11,552, 4871, 8880, 20,197 cells from liver, lung, colon, ileum, rectum, blood, spleen, bone marrow, eye, skin, stomach, esophagus and kidney, respectively (Table S1). The cell clusters were then annotated using canonical markers searched from the published articles (figs1). We finally annotated 119 cell types from 13 human tissues.

Lung belongs to respiratory system, in which 13 cell types were identified (figs2). These cell types consist of macrophages, Alveolar Type 2 cells (AT2), monocytes, NK&T cells, ciliated cells, basal cells, mast cells, neutrophils, Alveolar Type 1 cells (AT1), fibroblasts, endothelial cells, lymphatic cells and B cells. Among them, ∼31% cells are alveolar cells (AT2 and AT1) and ∼54% cells are immune cells (B cells, T cells and Myeloid cells).

Ileum, rectum, esophagus, colon, and stomach are part of digestive system. In esophagus, 7 cell types were identified (figs2). Keratinocytes show the highest percentage (∼66%) of total cells. The remaining cells are B cells, epithelial basal cells, glands cells, stroma cells, T cells and vessel cells. We detected 11 cell types in stomach (figs2), in which epithelial cells (∼29%) and pit mucous cell (PMCs) (∼23%) constitute the largest group. Besides, B cells, endothelial cells (ECs), enteroendocrine cells, fibroblasts, antral basal gland mucous cell (GMCs), macrophages, neck-like cells, proliferative cell (PCs) and T cells were identified in stomach. In ileum (figs2), 7 cell types, including enterocytes (ECs), enteroendocrine cells (EECs), goblet cells (Gs), Paneth cells (PCs), progenitor cells (PROs), stem cells (SCs), transient amplifying (TAs), were annotated. Enterocytes are the largest cell population (∼64%) in ileum. Rectum share cell types with ileum. Whereas, the largest cell population in rectum is progenitor cells (∼37%). A total of 9 cell types was detected in colon (figs2). The percentage of colonocytes (colonocytes and crypt top colonocytes) is 53%. BEST4+ cells, enteroendocrine cells (EECs), goblet cells, innate lymphoid cells, mast cells, T cells and undifferentiated cells were also identified.

Liver, spleen and skin play vital roles in immune systems. Immune cells (∼45%) and hepatocyte (∼42%) account for the vast majority of cells in liver (figs2). Cholangiocytes, endothelial cells and erthyroid cells were also detected. Spleen is the immune organs in human body, in which all the cells are immune cells (figs2). Spleen composed of a large proportion of T cells (∼33%) and B cells (∼43%). In addition, CD34 progenitor cells, cDCs, dividing cells, innate lymphoid cells, macrophages, monocytes, neutrophils, NK cells and pDCs, also make up the spleen cell populations. Skin is a physical barrier against the external environment. We identified a total of 7 cell populations in skin (figs2), in which pericytes (∼31%) and fibroblasts (∼21%) are the most enriched populations. Other immune cells, comprising T cells and myeloid cells, were also identified. Besides, basal cells, endothelium cells, and suprabasal keratinocyte constitute the skin cell populations.

The kidneys are the part of urinary system. Most of the cells in kidney are proximal tubule cells (Proximal Ts) (∼82%) (figs2). Besides, we annotated immune cells (∼8%), collecting duct intercalated cells (Collecting DIs), collecting duct principal cells (Collecting DPs), distal tubule cells (Distal Ts) and glomerular parietal epithelial cells (Glomerular PEs) in kidney.

The eyes are sensory organs in nervous system. Ten cell types were identified in eyes (figs2). Fibroblasts and immune cells composed of ∼31% and ∼25% of total cells, respectively. Eyes also contain endothelial cells, melanocytes, pericytes, retinal pigment epithelium (RPEs) and Schwann cells.

Bone marrow is the primary site of hematopoiesis. We identified large number of NK/NKT cells (∼44%) and erythrocytes (∼28%) cells in bone marrow (figs2). B cells, hematopoietic stem cells, MK progenitors, monocytes, neutrophils and DCs were also detected in bone marrow.

Blood is circulated around various tissues. Monocytes (∼32%) and T cells (∼55%) make up the largest proportion of blood cells (figs2). In addition, we also identified B cells, cDCs, macrophages, NK cells, pDCs and platelets in blood.

3.2. Expression atlas of ACE2, ssRNA viral receptors and other membrane proteins in 13 human tissues

For the viral life cycle, the viruses firstly bind the host receptors on the cell surface. Hence, the distribution of viral receptors in different cell types of diverse tissues can reveal the viral tropism and potential transmission routes. We therefore explored the expression spectrum of host receptors.

We firstly analyzed the expression pattern of ACE2 across 13 tissues (Fig. 1). Our results reveal that ACE2 expresses in lung AT2, liver cholangiocyte, colon colonocytes, esophagus keratinocytes, ileum ECs, rectum ECs, stomach epithelial cells, and kidney proximal tubules, consistent with the recent reports [6]. However, ACE2 expression levels are rather low in lung AT2 (4.7-fold lower than the average expression level of all ACE2 expressing cell types). We assume that the presence of co-receptors or other auxiliary membrane proteins in AT2 cells may facilitate the binding and entry of the nCoV.

Fig. 1.

Fig. 1

The expression profiles of ACE2 in 13 human tissues. The single cell expression maps of ACE2 in lung, liver, stomach, ileum, rectum, colon, blood, bone marrow, spleen, esophagus, kidney, skin and eye. ACE2 is expressed in lung AT2 (Alveolar cells Type2), liver cholangiocyte, colon colonocytes, esophagus keratinocytes, ileum ECs (enterocytes), rectum ECs, stomach epithelial cells, and kidney proximal tubules. None of the ACE2 transcripts was found in bone marrow and blood. PMCs, pit mucous cells; GMCs, antral basal gland mucous cells; PCs, proliferative cells in stomach and Paneth cells in ileum; EECs, enteroendocrine cells; Gs, goblet cells; PROs, progenitor cells; SCs, stem cells; TAs, transient amplifying; EECs, enteroendocrine cells; RPEs, retinal pigment epithelium.

We then analyzed the co-expression features of the human ssRNA viral receptors and membrane proteins. We collected a total of 152 pairs of high quality virus-host receptor interaction from Viral Receptor database [7] which contain 51 host receptors in 9 hosts and 96 viruses (Table S2). Furthermore, 400 membrane proteins were extracted from Membranome database [22]. Totally 451 genes were curated, 95.7% (432/451) of which express in at least one of the 13 tissues.

To elaborate the potential relationship between ACE2 and other membrane proteins or viral receptors, we calculated the Pearson Correlation Coefficient between each two genes in the curated reservoir. The findings show that 94 genes are significantly correlated with ACE2 (P < 0.01) in a manner of gene expression. Of note, ANPEP, ENPEP and DPP4 are the top three genes correlated with ACE2 (R > 0.8) (Fig. 2). ANPEP, alanyl aminopeptidase, is a host receptor targeted by porcine epidemic diarrhoea virus, human coronavirus 229E, feline coronavirus, canine coronavirus, transmissible gastroenteritis virus and infectious bronchitis virus. These viruses all belong to Coronaviridae. ANPEP mainly expresses in colon, ileum, rectum, kidney, liver and skin (figs3 & figs4), demonstrating that receptor of coronavirus may have similar expression profiles in human body. ENPEP, Glutamyl Aminopeptidase, belongs to the peptidase M1 family which is the mammalian type II integral membrane zinc-containing endopeptidases. ENPEP regulates blood pressure regulation and blood vessel formation through the catabolic pathway of the renin-angiotensin system [23]. The relationship between ENPEP and viral infection is unknown. DPP4, the receptor of MERS-CoV, shows expression similarity with ACE2, except that DPP4 expresses in some T cells of all the observed tissues (figs3 & figs4). All of the three genes encode peptidase, which are uniquely adopted by coronavirus as their receptors [24]. This result raised the possibility that ENPEP may be another yet unknown receptor for coronavirus. To further consolidate the findings, we calculated the Euclidean distance between all the curated proteins and constructed their hierarchy relationships across the 119 cell types. DPP4 was the first gene clustered with ACE2.

Fig. 2.

Fig. 2

Pearson correlation coefficients between the curated ssRNA viral receptors and membrane proteins. The warm colors mean positive correlation, and the cold colors mean negative correlation. The stars represent the correlation coefficients greater than 0.8. ANPEP, ENPEP and DPP4 show highest correlation with ACE2.

Together, our data demonstrates that the coronavirus receptors tend to share co-expression pattern across different tissues, consistent with the fact that CoVs infect similar types of cells and CoV-infected patients share similar clinical symptoms.

3.3. Macrophages are frequently interacted with the ACE2-expressing cells

Virus-infected cells can recruit and modulate immune cells through secreting chemokines or other cytokines. We sought to identify potential immune cells crosstalking with CoVs-targeted cells. The cell-cell interaction analysis was conducted by CellPhoneDB [21]. The interactions with p-value < 0.01 were adopted to construct the interaction relationship between cell types in each tissue.

Using the cell type expressing ACE2 as ligand-secreting cells, we calculated the total number of interactions with each receptor-secreting cell type. As a result, we found that macrophages showed highest active interaction with ACE2-expressing cells in liver, lung and stomach (Fig. 3A), sharing a CD74-MIF signaling pairs (Fig. 3B). CD74 is expressed on the cell surface of antigen-presenting cells and act as a receptor for the cytokine in immune cells. MIF, macrophage migration inhibitory factor, is a pro-inflammatory cytokine participating in inflammatory and immune responses. Besides, PROs, SCs and TAs show high activity responding to the ACE2-expressing cells in ileum and rectum. In colon, ILCs were found frequently interacted with the ACE2-expressing cells. Glomerular parietal epithelial cells and epithelial basal cells in kidney and esophagus also correlated with the cells transcribing ACE2 at very high frequency.

Fig. 3.

Fig. 3

Cell-cell interactions between cell types in 8 tissues expressing ACE2. (A) The cell-cell interaction analysis was conducted by CellPhoneDB. Cell types are nodes and interactions are edges. Red nodes indicate major cell types expressing ACE2 in each tissue. Size of cell type is proportional to the total number of interactions with the red nodes. (B) The cytokines that connected ACE2-expressing cells and the macrophages. Macrophages expression receptors and ACE2-expressing cells express ligands.

We conclude that the nCoV-targeted cells (ACE2-expressing), can interact with various cell types in different tissues, especially macrophages in lung, liver and stomach. Macrophages may be recruited by nCoV-targeted cells through CD74-MIF interaction and other signaling pathways during infection, play defensive and destructive functions.

4. Discussion

The coronaviruses are a large family of ssRNA viruses causing respiratory diseases in humans. Most of the coronaviruses are associated with mild clinical symptoms, except SARS-CoV and MERS-CoV, showing fatality rate of 9.6% and 34%, respectively [25,26]. In late December 2019, a novel coronavirus, named SARS-CoV-2, emerged in Wuhan, Hubei, China, and a total of 80,710 SARS-CoV-2 infected cases have been confirmed until March 06, 2020. The phylogenic tree constructed from full-genome sequences indicated that SARS-CoV-2 is a distinct clade from SARS-CoV and MERS-CoV [4].

The most common symptoms of patients infected with SARS-CoV-2 are fever and cough [27]. However, a proportion of patients show multi-organ damage and dysfunction, including acute respiratory distress syndrome (17%), acute respiratory injury (8%) and acute renal injury (3%). It is also increasingly recognized that SARS-CoV-2 could be transmitted via multiple routes.

The viruses target host cells via binding host receptors before engaging the infection cycle. ACE2 was proved to be the cell receptor of SARS-CoV-2, the same receptor as SARS-CoV. The expression profiles of ACE2 across different cell types of different organs will reveal clues of the virus transmission routes and its potential pathogenesis. In previous studies, ACE2 was found to express in the esophagus upper and stratified epithelial cells, absorptive enterocytes from ileum and colon, alveolar type II cells in lung, liver cholangiocyte and kidney proximal tubules. These findings suggested that the clinical symptoms of hepatic failure, respiratory injury, acute kidney injury or diarrhoea may be associated with the pervasive ACE2 expressing cells in these tissues. However, we and others found that ACE2 is lowly expressed, especially in the lung (the main target organ of nCoVs), raising the possible existence of co-receptors facilitating nCoV infection. It is well recognized that ssRNA viruses tend to have multiple receptors [7]. For example, ACE2, CD209 (Dendritic Cell-Specific ICAM-3-Grabbing Non-Integrin 1), CLEC4G (C-type lectin domain family 4 member G) and CLEC4M (C-type lectin domain family 4 member M) are receptors of SARS-CoV [[28], [29], [30], [31]]. In addition, other membrane proteins may also assist virus entry [3]. Since the viral receptor and co-receptors should be co-expressed on the same cell types, we analyzed single cell co-expression patterns covering 400 membrane proteins and 51 known viral receptors in this study. After calculating their gene expression similarity, we found ANPEP, ENPEP and DPP4 are top three genes correlated with ACE4 (R > 0.8). Interestingly, both ANPEP and DPP4 are viral receptors of human coronaviruses [32], while ENPEP is also a peptidase, despite that its involvement in virus infection is unclear. For mysterious reasons, human coronaviruses use peptidases as their receptors [24]. Now, we showed co-expression profiles of these molecules, indicating that different human CoVs actually target the similar cell types across different human tissues. It also explains why, patients infected with different human CoVs manifest similar clinical symptoms. We propose that further experimental validations should be performed to explore the role of these peptidase in SARS-CoV-2 and other CoVs infection.

Host immune response plays crucial roles in the fight against viruses. Generally, virally infected cells release interferons to suppress viral activities [[33], [34], [35]]. The interferons also act on warning the neighboring cells of virus attack. It can signify the nearby cells to upregulate MHC class I molecules to notify the CD8+ T cells to identify and eliminate the viral infection [36,37]. Understand the potential cell-cell communication mechanisms across different tissues is important for understanding immune reactions. In this study, we investigated the cells communicated with CoVs-targets (ACE2-expressing cells) in each tissue. Our results illustrate that macrophages frequently crosstalk with the ACE2-expressing cells, in lung, liver and stomach etc. This suggests that macrophages play the sentinel role during human CoVs infection. Future studies should investigate these signaling pairs in the setting of CoVs infection in patients and animal models.

Declaration of competing interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Footnotes

Appendix A. Supplementary data

The following are the Supplementary data to this article:

Multimedia component 1

Multimedia component 2

figs1.

figs1

figs2.

figs2

figs3.

figs3

figs4.

figs4

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1

Multimedia component 2