Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus (original) (raw)


The recent emergence of Wuhan coronavirus (2019-nCoV) puts the world on alert. 2019-nCoV is reminiscent of the SARS-CoV outbreak in 2002 to 2003. Our decade-long structural studies on the receptor recognition by SARS-CoV have identified key interactions between SARS-CoV spike protein and its host receptor angiotensin-converting enzyme 2 (ACE2), which regulate both the cross-species and human-to-human transmissions of SARS-CoV. One of the goals of SARS-CoV research was to build an atomic-level iterative framework of virus-receptor interactions to facilitate epidemic surveillance, predict species-specific receptor usage, and identify potential animal hosts and animal models of viruses. Based on the sequence of 2019-nCoV spike protein, we apply this predictive framework to provide novel insights into the receptor usage and likely host range of 2019-nCoV. This study provides a robust test of this reiterative framework, providing the basic, translational, and public health research communities with predictive insights that may help study and battle this novel 2019-nCoV.

KEYWORDS: 2019-nCoV, SARS coronavirus, angiotensin-converting enzyme 2, animal reservoir, cross-species transmission, human-to-human transmission

ABSTRACT

Recently, a novel coronavirus (2019-nCoV) has emerged from Wuhan, China, causing symptoms in humans similar to those caused by severe acute respiratory syndrome coronavirus (SARS-CoV). Since the SARS-CoV outbreak in 2002, extensive structural analyses have revealed key atomic-level interactions between the SARS-CoV spike protein receptor-binding domain (RBD) and its host receptor angiotensin-converting enzyme 2 (ACE2), which regulate both the cross-species and human-to-human transmissions of SARS-CoV. Here, we analyzed the potential receptor usage by 2019-nCoV, based on the rich knowledge about SARS-CoV and the newly released sequence of 2019-nCoV. First, the sequence of 2019-nCoV RBD, including its receptor-binding motif (RBM) that directly contacts ACE2, is similar to that of SARS-CoV, strongly suggesting that 2019-nCoV uses ACE2 as its receptor. Second, several critical residues in 2019-nCoV RBM (particularly Gln493) provide favorable interactions with human ACE2, consistent with 2019-nCoV’s capacity for human cell infection. Third, several other critical residues in 2019-nCoV RBM (particularly Asn501) are compatible with, but not ideal for, binding human ACE2, suggesting that 2019-nCoV has acquired some capacity for human-to-human transmission. Last, while phylogenetic analysis indicates a bat origin of 2019-nCoV, 2019-nCoV also potentially recognizes ACE2 from a diversity of animal species (except mice and rats), implicating these animal species as possible intermediate hosts or animal models for 2019-nCoV infections. These analyses provide insights into the receptor usage, cell entry, host cell infectivity and animal origin of 2019-nCoV and may help epidemic surveillance and preventive measures against 2019-nCoV.

IMPORTANCE The recent emergence of Wuhan coronavirus (2019-nCoV) puts the world on alert. 2019-nCoV is reminiscent of the SARS-CoV outbreak in 2002 to 2003. Our decade-long structural studies on the receptor recognition by SARS-CoV have identified key interactions between SARS-CoV spike protein and its host receptor angiotensin-converting enzyme 2 (ACE2), which regulate both the cross-species and human-to-human transmissions of SARS-CoV. One of the goals of SARS-CoV research was to build an atomic-level iterative framework of virus-receptor interactions to facilitate epidemic surveillance, predict species-specific receptor usage, and identify potential animal hosts and animal models of viruses. Based on the sequence of 2019-nCoV spike protein, we apply this predictive framework to provide novel insights into the receptor usage and likely host range of 2019-nCoV. This study provides a robust test of this reiterative framework, providing the basic, translational, and public health research communities with predictive insights that may help study and battle this novel 2019-nCoV.

INTRODUCTION

A novel coronavirus (2019-nCoV) from Wuhan, China, has recently caused over 500 confirmed cases of human infections and at least 17 deaths in China (https://www.cdc.gov/coronavirus/novel-coronavirus-2019.html). There are also numerous confirmed cases of 2019-nCoV infections in other countries including the United States. Many of the symptoms caused by 2019-nCoV, such as acute respiratory syndrome, are similar to those caused by severe acute respiratory syndrome coronavirus (SARS-CoV). SARS-CoV emerged in 2002 to 2003 and transmitted among humans, causing over 8,000 confirmed cases of human infections and about 800 deaths (14). It briefly reemerged in 2003 to 2004, with 4 confirmed cases of mild human infections and no human-to-human transmission (57). SARS-CoV has also been isolated from animals and been adapted to lab cell culture (5, 811). It is believed that bats and palm civets were the natural and intermediate reservoirs for SARS-CoV, respectively, and that SARS-CoV transmitted from palm civets to humans in an animal market in Southern China (1214). It has been reported that 2019-nCoV also infected humans in an animal market in Wuhan, although the animal source of the outbreak is currently unknown. Moreover, it has been confirmed that 2019-nCoV has the capacity to transmit from human to human.

Coronaviruses are a large family of single-stranded enveloped RNA viruses and can be divided into four major genera (15). Both SARS-CoV and 2019-nCoV belong to the β-genus. An envelope-anchored spike protein mediates coronavirus entry into host cells by first binding to a host receptor and then fusing viral and host membranes (16). A defined receptor-binding domain (RBD) of SARS-CoV spike specifically recognizes its host receptor angiotensin-converting enzyme 2 (ACE2) (17, 18). Different lines of research have shown that which host is susceptible to SARS-CoV infection is primarily determined by the affinity between the viral RBD and host ACE2 in the initial viral attachment step (1923). In a span of about 10 years, we determined a series of crystal structures of SARS-CoV RBD complexed with ACE2; the RBDs were from SARS-CoV strains isolated from different host species in different years and the ACE2 receptor orthologues were derived from different animal species (18, 2426). These structures showed that SARS-CoV RBD contains a core structure and a receptor-binding motif (RBM) and that the RBM binds to the outer surface of the claw-like structure of ACE2 (Fig. 1A) (25). Importantly, we identified two virus-binding hot spots on human ACE2 (24, 26). A number of naturally selected RBM mutations occurred near these two virus-binding hot spots, and these residues largely determined the host range of SARS-CoV (Fig. 1B and C). Furthermore, we discovered specific amino acids at the 442, 472, 479, 480, and 487 positions that enhance viral binding to human ACE2 and some other amino acids at these same positions that enhance viral binding to civet ACE2 (Fig. 1C). Importantly, when all human-ACE2-favoring residues were combined into one RBD, this RBD binds to human ACE2 with super affinity and the corresponding spike protein mediates viral entry into human cells with super efficiency (Fig. 1C) (26). An RBD with super affinity for civet ACE2 was also designed and empirically confirmed (Fig. 1C) (26). These gain-of-function data provided strong supporting evidence for the accuracy of our structural predictions. A long-term goal of these earlier studies is to establish a structure-function predictive framework for improved epidemic surveillance. More specifically, we aim to predict the receptor usage and host cell infectivity of future SARS-CoV or SARS-like viral strains and identify their possible animal origins and animal models, based on the sequences of their spike proteins and the known atomic structures of original SARS-CoV RBD/ACE2 complex. Here, based on the newly released sequence of 2019-nCoV RBD, we reiteratively apply this predictive framework to provide novel insights into the receptor usage and likely host range of 2019-nCoV.

FIG 1.

FIG 1

Structural analysis of human ACE2 recognition by 2019-nCoV and SARS-CoV. (A) Overall structure of human SARS-CoV RBD (year 2002) complexed with human ACE2. PDB ID is 2AJF. ACE2 is in green, the core of RBD (receptor-binding domain) is in cyan, and RBM (receptor-binding motif) is in magenta. (B) Critical residue changes in the RBMs of SARS-CoV and 2019-nCoV. All these five residues in SARS-CoV underwent natural selections and were shown to be critical for ACE2 recognition, cell entry, and host range of SARS-CoV. The residue numbers are shown as in SARS-CoV RBD, with the corresponding residue numbers in 2019-nCoV shown in parentheses. For viral adaption to ACE2, > means “is more adapted”, >>> means “is much more adapted,” and = means “is similarly adapted.” Information about the two most critical residues, 479 and 487, is in red. (C) Experimentally determined structure of the interface between a designed SARS-CoV RBD (optimized for human ACE2 recognition) and human ACE2. PDB ID is 3SCI. (D) Modeled structure of the interface between 2019-nCoV RBD and human ACE2. Here, mutations were introduced to the RBD region in panel C based on sequence differences between SARS-CoV and 2019-nCoV. GenBank accession numbers are MN908947.1 for 2019-nCoV spike, NC_004718.3 for human SARS-CoV spike (year 2002; strain Tor2), AGZ48818.1 for bat SARS-CoV spike (year 2013; strain Rs3367), AY304486.1 for civet SARS-CoV spike (year 2002; SZ3), and AY525636 for human/civet SARS-CoV spike (year 2003; strain GD03). References for the other sequences are in parentheses as follows: civet SARS-CoV spike (year 2005) (9); human SARS-CoV spike (year 2008) (8).

RESULTS

The 2019-nCoV spike phylogeny is firmly rooted among other β-genus lineage b bat SARS-like coronaviruses (Fig. 2) but is ancestral to both human SARS-CoV (epidemic strain isolated in year 2002) and bat SARS-CoV strains that use ACE2 receptor to enter and infect primary host lung cells (11, 17). The overall sequence similarities between 2019-nCoV spike and SARS-CoV spike (isolated from human, civet, or bat) are around 76% to 78% for the whole protein, around 73% to 76% for the RBD, and 50% to 53% for the RBM (Fig. 3A and B). In comparison, human coronavirus Middle East respiratory syndrome coronavirus (MERS-CoV) and bat MERS-like coronavirus HKU4 share lower sequence similarities in their spikes, RBDs, or RBM (Fig. 3C), and yet they recognize the same receptor dipeptidyl peptidase 4 (DPP4) (27, 28). Thus, sequence similarities between 2019-nCoV and SARS-CoV spikes suggest the possibility for them to share the same receptor ACE2. Importantly, compared to SARS-CoV RBM, 2019-nCoV RBM does not contain any deletion or insertion (except for a one-residue insertion on a loop away from the ACE2-binding region) (Fig. 3A), providing additional evidence that 2019-nCoV uses ACE2 as its receptor. Furthermore, among the 14 ACE2-contacting residues in the RBD, 9 are fully conserved and 4 are partially conserved among 2019-nCoV and SARS-CoV from human, civet, and bat (Fig. 3A). A final piece of strong evidence supporting ACE2 as the receptor for 2019-nCoV surrounds the five residues in 2019-nCoV RBM that underwent natural selections in SARS-CoV and played critical roles in the cross-species transmission of SARS-CoV (i.e., residues 442, 472, 479, 480, and 487 in SARS-CoV RBD) (Fig. 1B). We discuss these residues in more detail below.

FIG 2.

FIG 2

Spike phylogeny of representative β-genus lineage b coronaviruses. The spike protein sequences of selected β-genus lineage b coronaviruses were aligned and phylogenetically compared. Sequences were aligned using free end gaps with the Blosum62 cost matrix in Geneious Prime. The tree was constructed using the neighbor-joining method based on the multiple sequence alignment, also in Geneious Prime. Numbers at the end of each sequence correspond to the GenBank accession number. The radial phylogram was exported from Geneious and then rendered for publication using EvolView (evolgenius.info) and Adobe Illustrator CC 2020.

FIG 3.

FIG 3

Sequence comparison of 2019-nCoV and SARS-CoV. (A) Sequence alignment of SARS-CoV and 2019-nCoV RBDs. RBM residues are in magenta. The five critical residues in Fig. 1B are in blue. ACE2-contacting residues are shaded. Asterisks indicate positions that have a single, fully conserved residue. Colons indicate positions that have strongly conserved residues. Periods indicate positions that have weakly conserved residues. (B) Sequence similarities of SARS-CoV and 2019-nCoV in the spike protein, RBD, and RBM, respectively. (C) Sequence similarities of MERS-CoV and HKU4 virus in the spike protein, RBD, and RBM, respectively. GenBank accession numbers are JX869059.2 for human MERS-CoV spike and NC_009019.1 for bat HKU4-CoV spike.

First, residue 493 in 2019-nCoV RBD (corresponding to residue 479 in SARS-CoV) is a glutamine (Fig. 1B and D). A previously designed SARS-CoV RBD is optimal for binding to human ACE2 (Fig. 1B and C) (26). According to the structure of this designed RBD, residue 479 is located near virus-binding hot spot Lys31 (i.e., hot spot 31) on human ACE2 (Fig. 1C). Hot spot 31 consists of a salt bridge between Lys31 and Glu35 buried in a hydrophobic environment. In civet SARS-CoV RBD (year 2002), residue 479 is a lysine, which imposes steric and electrostatic interference with hot spot 31. In human SARS-CoV RBD (year 2002), residue 479 becomes an asparagine. The K479N mutation removes the unfavorable interaction at the RBD-human ACE2 interface, enhances viral binding to human ACE2, and plays a critical role in the civet-to-human transmission of SARS-CoV (Fig. 1C) (2426). Here, we constructed a structural model for the complex of 2019-nCoV RBD and human ACE2 (Fig. 1D). Importantly, Gln493 in 2019-nCoV RBD is compatible with hot spot 31, suggesting that 2019-nCoV is capable of recognizing human ACE2 and infecting human cells.

Second, residue 501 in 2019-nCoV RBD (corresponding to residue 487 in SARS-CoV) is an asparagine (Fig. 1B and D). Based on our previous structural analysis, residue 487 in SARS-CoV is located near virus-binding hot spot Lys353 (i.e., hot spot 353) on human ACE2 (Fig. 1C) (26). Hot spot 353 consists of a salt bridge between Lys353 and Asp38 also buried in a hydrophobic environment. In civet SARS-CoV RBD (year 2002), residue 487 is a serine, which cannot provide favorable support for hot spot 353. In human SARS-CoV isolated in year 2002, residue 487 is a threonine, which strengthens the structural stability of hot spot 353. The S487T mutation adds the favorable interaction at the RBD-human ACE2 interface, enhances viral binding to human ACE2, and plays a critical role in the human-to-human transmission of SARS-CoV (2426). In human SARS-CoV isolated in year 2003, residue 487 is a serine and there was no human-to-human transmission for this SARS-CoV strain. Asn501 in 2019-nCoV RBD provides more support to hot spot 353 than Ser487 but less than Thr487. This analysis suggests that 2019-nCoV recognizes human ACE2 less efficiently than human SARS-CoV (year 2002) but more efficiently than human SARS-CoV (year 2003). Hence, at least when considering the ACE2-RBD interactions, 2019-nCoV has gained some capability to transmit from human to human.

Third, residues 455, 486, and 494 are leucine, phenylalanine, and serine in 2019-nCoV RBD, respectively (corresponding to residues 442, 472, and 480 in SARS-CoV, respectively) (Fig. 1B to D). Based on our previous structural analysis, these three residues in SARS-CoV RBD play significant roles, albeit not as dramatic as residues 479 and 487, in ACE2 binding (2426). More specifically, Tyr442 of human and civet SARS-CoV RBDs provides unfavorable interactions with hot spot 31 on human ACE2 (this residue has been mutated to Phe442 in the optimized RBD); Leu455 of 2019-nCoV RBD provides favorable interactions with hot spot 31, hence enhancing viral binding to human ACE2. Leu472 of human and civet SARS-CoV RBDs provides favorable support for hot spot 31 on human ACE2 through hydrophobic interactions with ACE2 residue Met82 and several other hydrophobic residues (this residue has been mutated to Phe472 in the optimized RBD); Phe486 of 2019-nCoV RBD provides even more support for hot spot 31, hence also enhancing viral binding to human ACE2. Asp480 of human and civet SARS-CoV RBDs provides favorable support for hot spot 353 on human ACE2 through a neighboring tyrosine (this residue remains as an aspartate in the optimized RBD); Ser494 in 2019-nCoV RBD still provides positive support for hot spot 353, but the support is not as favorable as that provided by Asp480. Overall, Leu455, Phe486, and Ser494 of 2019-nCoV RBD support the idea that 2019-nCoV recognizes human ACE2 and infects human cells.

Last, having analyzed the interactions between 2019-nCoV RBD and human ACE2, how does 2019-nCoV RBD interact with putative ACE2 receptor orthologues from other animal species? Compared to human ACE2, both hot spot 31 and hot spot 353 on civet ACE2 have changed significantly (Fig. 4A). Specifically, residue 31 of civet ACE2 becomes a threonine, which can no longer form a salt bridge with Glu35; residue 38 of civet ACE2 becomes a glutamate, which forms a strong bifurcated salt bridge with Lys353 and no longer needs strong support from neighboring residues. A previously designed SARS-CoV RBD is optimal for binding to civet ACE2 (Fig. 1B and Fig. 4B) (26). In this designed RBD, Tyr442 forms a hydrogen bond with Thr31 of civet ACE2, and Arg479 forms a strong bifurcated salt bridge with Glu35 of civet ACE2. Moreover, in the designed RBD, Pro472 avoids unfavorable interactions with Thr82 of civet ACE2, and Gly480 does not provide unneeded support for hot spot 353. Furthermore, in the designed RBD, Thr487 provides limited but helpful support for hot spot 353. Here, we constructed a structural model for the complex of 2019-nCoV RBD and civet ACE2 (Fig. 4C). Based on this model, Phe486 of 2019-nCoV RBD forms a moderately unfavorable interaction with the polar side chain of Thr82 of civet ACE2, and Leu455 and Gln493 would lose favorable interactions with civet ACE2, but they would still be compatible with civet ACE2. Thus, 2019-nCoV likely still uses civet ACE2 as its receptor, although it appears that 2019-nCoV RBD has not evolved adaptively for civet ACE2 binding. Moreover, 2019-nCoV likely does not use mouse or rat ACE2 as its receptor because mouse or rat ACE2 contains a histidine at the 353 position, which does not fit into the virus-receptor interaction as well as a lysine does (Fig. 3A). 2019-nCoV RBD likely recognizes ACE2 from pigs, ferrets, cats, orangutans, monkeys, and humans with similar efficiencies, because these ACE2 molecules are identical or similar in the critical virus-binding residues. The situation involving bat ACE2 is complex because of the diversity of bat species (29). Based on the sequence of ACE2 from Rhinolophus sinicus bats (which can be recognized by bat SARS-CoV strain Rs3367), 2019-nCoV RBD likely also recognizes bat ACE2 as its receptor. Overall, 2019-nCoV likely recognizes ACE2 orthologues from a diversity of species, except for mouse and rat ACE2 (which should be poor receptors for 2019-nCoV).

FIG 4.

FIG 4

Structural analysis of animal ACE2 recognition by 2019-nCoV and SARS-CoV. (A) Critical changes in virus-contacting residues of ACE2 from different host species. GenBank accession numbers for ACE2 are as follows: NM_001371415.1 (human), AAX63775.1 (civet), KC881004.1 (bat), NP_001123985.1 (mouse), AY881244 (rat), NP_001116542.1 (pig), AB208708 (ferret), NM_001039456 (cat), Q5RFN1 (orangutan), and AY996037 (monkey). (B) Experimentally determined structure of the interface between a designed SARS-CoV RBD (optimized for civet ACE2 recognition) and civet ACE2. PDB ID is 3SCK. (C) Modeled structure of the interface between 2019-nCoV RBD and civet ACE2. Here, mutations were introduced to the RBD region in panel B based on sequence differences between SARS-CoV and 2019-nCoV.

DISCUSSION

Atomic-level resolution of complex virus-receptor interactions provides new opportunities for predictive biology. In this instance, we used prior knowledge gleamed from multiple SARS-CoV strains (isolated from different hosts in different years) and ACE2 receptors (from different animal species) to model predictions for novel 2019-nCoV. Our structural analyses confidently predict that 2019-nCoV uses ACE2 as its host receptor, consistent with two other new publications (30, 31). Compared to previously isolated SARS-CoV strains, 2019-nCoV likely uses human ACE2 less efficiently than human SARS-CoV (year 2002) but more efficiently than human SARS-CoV (year 2003). Because ACE2-binding affinity has been shown to be one of the most important determinants of SARS-CoV infectivity, 2019-nCoV has evolved the capability to infect humans and some capability to transmit among humans. Alarmingly, our data predict that a single N501T mutation (corresponding to the S487T mutation in SARS-CoV) may significantly enhance the binding affinity between 2019-nCoV RBD and human ACE2. Thus, 2019-nCoV evolution in patients should be closely monitored for the emergence of novel mutations at the 501 position (to a lesser extent, also the 494 position).

What is the source of 2019-nCoV, and did a key intermediate host play an important role in the current 2019-nCoV outbreak? Similarly to SARS-CoV, 2019-nCoV most likely has originated from bats, given its close phylogenetic relationship with other β-genus lineage b bat SARS-CoV (Fig. 2). Moreover, 2019-nCoV likely recognizes ACE2 from a diversity of animal species, including palm civets, as its receptor. In the case of SARS-CoV, some of its critical RBM residues were adapted to human ACE2, while some others were adapted to civet ACE2 (26); this type of partial viral adaptation to two host species promoted virus replication and cross-species transmission between the two host species. In the case of 2019-nCoV, however, there is no strong evidence for adaptive mutations in its critical RBM residues that specifically promote viral binding to civet ACE2. Hence, either palm civets were not intermediate hosts for 2019-nCoV, or they passed 2019-nCoV to humans quickly before 2019-nCoV had any chance to adapt to civet ACE2. Like SARS-CoV, 2019-nCoV will likely replicate inefficiently in mice and rats, ruling them out as intermediate hosts for 2019-nCoV. Moreover, we predict that either 2019-nCoV or laboratory mice and rats would need to be genetically engineered before a robust mouse or rat model for 2019-nCoV would become available. Pigs, ferrets, cats, and nonhuman primates contain largely favorable 2019-nCoV-contacting residues in their ACE2 and hence may serve as animal models or intermediate hosts for 2019-nCoV. It is worth noting that SARS-CoV was isolated in wild palm civets near Wuhan in 2005 (9), and its RBD had already been well adapted to civet ACE2 (except for residue 487). Thus, bats and other wild animals in and near Wuhan should be screened for both SARS-CoV and 2019-nCoV.

These above analyses are based on the modeling of 2019-nCoV RBD-ACE2 interactions, heavily grounded in a series of atomic-level structures of SARS-CoV isolated from different hosts in different years (18, 2426). There are certainly other factors that affect the infectivity and pathogenesis of 2019-nCoV and will need to be investigated. Nevertheless, our decade-long structural studies on SARS-CoV have firmly shown that receptor recognition by SARS-CoV is one of the most important determinants of its cross-species and human-to-human transmissions, a conclusion that has been confirmed by different lines of research (13, 14). One of the long-term goals of our previous structural studies on SARS-CoV was to build an atomic-level iterative framework of virus-receptor interactions that facilitates epidemic surveillance, predicts species-specific receptor usage, and identifies potential animal hosts and likely animal models of human diseases. This study provides a robust test of this reiterative framework, providing the basic, translational, and public health research communities with predictive insights that may help study and battle this novel 2019-nCoV.

MATERIALS AND METHODS

Structural analysis.

Software Coot was used for introducing mutations to structural models (32). Software PyMOL (The PyMOL Molecular Graphics System, version 1.5.0.4, Schrödinger, LLC) was used for preparing structural figures.

Phylogenetic analysis.

Consensus radial phylograms were generated in Geneious Prime (v.2020.0.3), with the Jukes-Cantor genetic distance model, the neighbor-joining build method, and no outgroup, with 100 bootstrap replicates. Phylograms were rendered for publication in Adobe Illustrator CC 2020.

Sequence alignment.

Protein sequence alignments were done using Clustal Omega (33).

ACKNOWLEDGMENTS

This work was supported by NIH grants R01AI089728 and R01AI110700 (to F.L. and R.S.B.).

We thank Shanghai Public Health Clinical Center and School of Public Health, Central Hospital of Wuhan, Huazhong University of Science and Technology, Wuhan Center for Disease Control and Prevention, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control, and University of Sydney Australia for releasing the sequence of the 2019-nCoV genome.

REFERENCES