T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences - PubMed (original) (raw)
doi: 10.7554/eLife.22057.
Asaf Poran 1, Eric Shifrut 1, Shlomit Reich-Zeliger 1, Erez Greenstein 1, Irena Zaretsky 1, Tomer Arnon 1 2, Francois Van Laethem 3, Alfred Singer 3, Jinghua Lu 4, Peter D Sun 4, Irun R Cohen 1, Nir Friedman 1
Affiliations
- PMID: 28731407
- PMCID: PMC5553937
- DOI: 10.7554/eLife.22057
T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences
Asaf Madi et al. Elife. 2017.
Abstract
Diversity of T cell receptor (TCR) repertoires, generated by somatic DNA rearrangements, is central to immune system function. However, the level of sequence similarity of TCR repertoires within and between species has not been characterized. Using network analysis of high-throughput TCR sequencing data, we found that abundant CDR3-TCRβ sequences were clustered within networks generated by sequence similarity. We discovered a substantial number of public CDR3-TCRβ segments that were identical in mice and humans. These conserved public sequences were central within TCR sequence-similarity networks. Annotated TCR sequences, previously associated with self-specificities such as autoimmunity and cancer, were linked to network clusters. Mechanistically, CDR3 networks were promoted by MHC-mediated selection, and were reduced following immunization, immune checkpoint blockade or aging. Our findings provide a new view of T cell repertoire organization and physiology, and suggest that the immune system distributes its TCR sequences unevenly, attending to specific foci of reactivity.
Keywords: CDR3; T cell receptor; computational biology; human; immunology; lymphocyte subsets; mouse; systems biology.
Conflict of interest statement
The authors declare that no competing interests exist.
Figures
Figure 1.. Mouse and human TCR repertoires manifest dense similarity networks surrounding public CDR3β sequences.
(A) Networks formed by the thousand most frequent CDR3 AA sequences expressed in the TCRβ repertoire of splenic CD4 T cells from a single mouse. Nodes (CDR3 AA sequences) were connected by edges defined by a Levenshtein distance of 1 (one AA substitution/insertion/ deletion). Node size reflects its log frequency (scale at the bottom). The nodes are colored according to their sharing levels in a reference dataset of 28 mice (Madi et al., 2014), from Private CDR3 sequences (white, found in only one mouse in the reference dataset) to public (black, shared by all 28 mice). Inset shows a blowup of the marked cluster with labeled CDR3β AA sequences (nodes) and edges which represent a Levenshtein distance of 1 between connected nodes. (B) Networks formed by a thousand CDR3β sequences randomly chosen from the repertoire of a single mouse. (C) A Network formed by the thousand most frequent CDR3 AA sequences in the TCRβ repertoire of a representative human subject (data from [Britanova et al., 2014]). Nodes are colored by their degree of sharing among the 11 young subjects in that study (ages 6–25 years). (D) Mean degree of node connectivity as a function of sharing level in a network formed by the top 1000 CDR3 sequences (blue) or by 1000 randomly chosen sequences (orange). Error bars indicate standard error (SE) across the 12 mice used in this study. DOI:
http://dx.doi.org/10.7554/eLife.22057.002
Figure 1—figure supplement 1.. Mean number of clustered nodes as a function of the sample size selected for generating the network.
(Right panel is a zoomed-in version of the left panel). Results are shown for 4 representative conditions, with different levels of observed network connectivity, as expressed by the number of clustered nodes (degree >0). These graphs show that regardless of sample size, (A, B) networks from a naïve mouse are the most connected, followed by those of immunized (p277), aged mice, and lastly p277 in vitro stimulation, which is the least connected. (C, D) networks for 39 human samples (Britanova et al., 2014) divided into 4 age groups. Above ~1000 sequences, the trend is linear; hence the relative fraction of clustered nodes is not sensitive to sample size. Thus, our analysis of network connectivity is not sensitive to the number of sequences used. DOI:
http://dx.doi.org/10.7554/eLife.22057.003
Figure 1—figure supplement 2.. CDR3β sequences form networks with clusters dominated by J-genes and heterogeneous for V-genes.
An example of a network constructed from the 1000 most abundant CDR3β AA sequences from a single mouse. Both panels show the same network. In the left panel, nodes are colored by the dominating J-gene; in the right panel color indicates the dominating V-gene for each AA sequence. Network clusters mostly consist of a single J-gene, with only a few clusters featuring two or three primary J-genes (left). In contrast, V-gene usage in clusters is heterogeneous, with no obvious dominating gene segment (right). This pattern of clusters with homogenous J-gene and heterogeneous V-gene usage was consistent in all top 1000 CDR3β AA sequence networks we examined. DOI:
http://dx.doi.org/10.7554/eLife.22057.004
Figure 1—figure supplement 3.. CD8+ T cell networks formed by the thousand most frequent CDR3 AA sequences expressed in two mice.
Nodes (CDR3 AA sequences) were connected by edges defined by a Levenshtein distance of 1. DOI:
http://dx.doi.org/10.7554/eLife.22057.005
Figure 1—figure supplement 4.. Networks from C3H.HeSnJ mouse strain bearing the H2k MHC haplotype.
CD4+ T cell networks formed by the thousand most frequent CDR3 AA segments expressed in two mice. Nodes (CDR3 AA sequences) were connected by edges defined by a Levenshtein distance of 1. DOI:
http://dx.doi.org/10.7554/eLife.22057.006
Figure 1—figure supplement 5.. Evaluating the level of node centrality vs. sharing level.
The mean betweenness centrality is presented as a function of the sharing level in the dataset of 28 mice, for networks composed of the 1000 most frequent CDR3 AA sequences and for networks composed of 1000 randomly selected CDR3 AA sequences from the dataset. Error bars indicate standard error (SE) across the 12 mice used in this study. DOI:
http://dx.doi.org/10.7554/eLife.22057.007
Figure 1—figure supplement 6.. Node centrality vs. sharing level in human TCRβ repertoires.
TCRβ repertoires of 11 healthy young human subjects previously investigated by Britanova et al. (2014). Shown is the mean degree of nodes as a function of their sharing level in the dataset, for networks composed of the 1000 most frequent CDR3 aa sequences and for networks composed of 1000 randomly selected sequences. Note that public human TCRs manifest a higher degree of connectivity than do private TCRs. DOI:
http://dx.doi.org/10.7554/eLife.22057.008
Figure 2.. TCR repertoires are focused around public and cross species- (CS-) public CDR3 AA sequences shared by mice and humans.
(A) Human (left) or mouse (right) CDR3 sequences are grouped according to their sharing level in the corresponding dataset. For each sharing group, we plotted the percentage of sequences that were shared by at least one subject of the other species. (B) Examples of CS-Public CDR3 sequences, and their V and J segments in mouse and human repertoires. (C) A network formed by the top 1000 CDR3 sequences of a single human subject. Node color represents its sharing within or between species: Pink - shared by all 11 human subjects; Green - shared by at least 25 of the 28 mice; Black – CS-public nodes shared by all 11 humans and at least 25 mice; Blue - not shared. (D) The mean number of edges per node (degree) in the 11 human and 28 mouse networks, subdivided into the four categories as in C. Error bars mark SE. DOI:
http://dx.doi.org/10.7554/eLife.22057.009
Figure 2—figure supplement 1.. Cross-species TCR sharing.
(A) All CDR3β sequences in the 28 mouse dataset were categorized according to their sharing level, from private (found in only one mouse), to public (found in all 28 mice). The graph presents the percent of sequences within each category that were also found in the human dataset (in at least 1 of 11 young subjects). (B) All CDR3β sequences in the 11 young human subjects were categorized according to their sharing level, from private (found in only one subject), to public (found in all subjects). The graph presents the percent of sequences within each group that were also found in at least one of the 28 mice. In both cases, the fraction of cross-species sequences increases with the sharing level; sequences that are more public in one species are more frequently found in the other species. DOI:
http://dx.doi.org/10.7554/eLife.22057.010
Figure 2—figure supplement 2.. Sharing properties of the 86 observed CS-public CDR3 sequences in simulated data.
We generated 100 datasets of simulated human and mouse repertoires, with number of individuals (11 humans, 28 mice) and repertoire sizes as in the experimental data. For each of the 86 observed CS-public sequences, we plot its mean sharing level in the simulations, for human repertoires (red) and mouse (blue) repertoires . The top panel shows 54 sequences that are CS-public in both experiment and simulations. The lower panel shows 32 sequences that are CS-public in the experimental data but not in the simulations. Note that there were additionally about 200 CS-public sequences in the simulations which were not CS-public in the data. DOI:
http://dx.doi.org/10.7554/eLife.22057.011
Figure 2—figure supplement 3.. CS-Public CDR3 sequences are central in mouse TCRβ networks.
Shown is a representative network of the 1000 most frequent sequences from a mouse. Nodes are labeled according to 4 categories: CDR3 sequences that are not public; CDR3 sequences shared by all 11 human samples; CDR3 sequences shared by at least 25 mice; CDR3 sequences shared by at least 25 mice and all 11 humans. DOI:
http://dx.doi.org/10.7554/eLife.22057.012
Figure 2—figure supplement 4.. Degree of CS-public sequences is correlated in mouse and human TCR networks.
Each dot represents one CS-public sequence that is found among the most abundant 1000 sequences in at least one mouse and at least one human subject (n = 45 sequences). There is a significant correlation between the degree of CS-public sequences in the two species (R = 0.65, spearman); Sequences that are more connected in one species are typically more connected in the other as well. DOI:
http://dx.doi.org/10.7554/eLife.22057.013
Figure 3.. Public CDR3 sequences form highly connected similarity networks in mice and humans and are enriched for self-associated immune reactivities.
(A) A network formed by the 1000 most shared mouse CDR3 sequences (found in >25 of 28 mice). Node size corresponds to the mean abundance of the sequence. Nodes are colored according to their cluster association. 124 CDR3 sequences that were previously annotated (see [Madi et al., 2014]) were added to the network and are presented as arrowheads. 63 annotated sequences were either identical to, or at a Levenshtein distance of 1 from one of the nodes, and are listed next to each cluster (with the corresponding color). Annotations of 61 un-clustered sequences are also listed. (B) A network formed by the 1000 most frequent public CDR3 sequences in humans (found in all 11 subjects). Previously annotated mouse (n = 124) and human (n = 30) CDR3 sequences were added to the network as in A (arrowheads). The clusters were distinctly colored in order to visually match between clusters and their annotated sequences, not to define antigen specificity of a cluster. A list of linked annotated CDR3 sequences is shown next to each cluster (11 of 30 human and 23 of 124 mouse annotated CDR3 sequences), together with a list of unclustered annotated human sequences. DOI:
http://dx.doi.org/10.7554/eLife.22057.014
Figure 3—figure supplement 1.. Public CDR3 sequences form highly connected similarity networks in mice and are enriched for self-associated immune reactivities.
Sequence visualization of the red (top right) cluster in the mouse CDR3 sequences network shown in Figure 3A. The original full network is formed by the 1000 most shared mouse CDR3 sequences (found in >25 of 28 mice). 124 CDR3 sequences that were previously annotated (see [Madi et al., 2014]) were added to the network and are presented as red arrowheads. 13 annotated sequences were either identical to, or at a Levenshtein distance of 1 from one of the nodes in this cluster, and their associated pathology/antigen is listed next to the corresponding node. DOI:
http://dx.doi.org/10.7554/eLife.22057.015
Figure 4.. MHC-dependent public CDR3 sequences form highly connected similarity networks.
(A) Mean number of clustered nodes in networks formed by the top 1000 CDR3 sequences from the following repertoires: DN thymocytes (CD4−CD8−) (n = 3), CD4+ spleen T cells (n = 3), Quad-KO mice(Van Laethem et al., 2007) (lack MHC-I, MHC–II, CD4 and CD8) (n = 4), and their WT controls (C57BL/6) (n = 4). Error bars signify standard error. (B) Cumulative frequency of the 86 CS-public CDR3 sequences (observed in the reference datasets of 28 WT mice and 11 healthy humans) is shown for: DN thymocytes (CD4-CD8-) (n = 3), CD4+ spleen T cells (n = 3) (left), Quad-KO mice (n = 4), and their WT controls (C57BL/6) (n = 4). Error bars signify standard error. (C) Cumulative frequency of nucleotide sequences coding for two annotated (C9 and COPD, top) and two unknown (bottom) public AA CDR3 sequences from repertoires of DN thymocytes and CD4+ spleen T cells (sequences from 3 mice are shown). Each color represents a different nucleotide sequence. DOI:
http://dx.doi.org/10.7554/eLife.22057.016
Figure 4—figure supplement 1.. DN thymocytes manifest lower convergent recombination.
Comparison of the number of nt sequences encoding, on average, an AA CDR3 sequence, for public CDR3 AA sequences, found to be shared by more than 25 out of 28 mice in the reference dataset. Public CDR3 sequences coming from DN thymocytes were encoded on average by a lower number of nucleotide (nt) sequences compared to those from CD4+ splenic T cells (p<2.2e-16 for each of these top sharing levels). DOI:
http://dx.doi.org/10.7554/eLife.22057.017
Figure 5.. Immunization, in vitro antigen re-stimulation, anti-CTLA4 antibody treatment and aging perturb TCR networks coupled with an increase in repertoire skewness.
(A–C) Networks of the thousand most frequent CDR3 sequences are shown for (A) a naïve mouse, (B) a mouse Immunized with a self-peptide (p277), and (C) T cells from the spleen of an immunized mouse, which were re-stimulated in vitro with the p277 peptide. (D) Mean number of clustered nodes in networks formed by the top 1000 CDR3 sequences from the following repertoires: Left: naïve mice (n = 12); p277 immunized mice, 7d post immunization (n = 5); and in-vitro re-stimulated with p277 (n = 5). Right: naïve mice (n = 12); OVA immunized mice, 7d post immunization (n = 5); in-vitro re-stimulated with OVA peptide (n = 3); and immunized mice, 2 months post-immunization (n = 5). Error bars indicate standard error. (E) Frequency of the top 1000 most frequent CDR3 sequences by sharing level, for the same repertoires as in (D). Sharing levels were calculated based on sharing in the reference dataset of 28 mice. (F) The Gini Coefficient (a measure for repertoire evenness) plotted vs. the number of clustered nodes, for the top 1000 CDR3 sequences from the repertoires from (D, E) and from aged mice (n = 3). (G) The Gini Coefficient plotted vs. the number of clustered nodes for 39 human samples (Britanova et al., 2014) divided into 4 age groups. (H) The number of clustered nodes (left) and the number of public clonotypes (right, shared by all 11 young human samples in a reference cohort [Britanova et al., 2014]) for the top 1000 most abundant CDR3 sequences in 21 paired samples of patients at baseline and 30 to 60 days after receiving CTLA4 blockade treatment with tremelimumab (data from [Robert et al., 2014]). (I) Number of public clonotypes (defined as in H) out of the top 1000 most abundant CDR3 sequences in either healthy donors (left) or Juvenile Idiopathic Arthritis patients (right). (J) A conceptual figure of the evolution of repertoire structure. In young and healthy individuals the repertoire is focused and even (top-right), with public and CS-public CDR3 sequences at the center of network clusters. Following an immune response, or with aging, the repertoire becomes more skewed and spread in sequence space (bottom-left), due to preferential expansion of private clones at the expense of more public clones. DOI:
http://dx.doi.org/10.7554/eLife.22057.018
Figure 5—figure supplement 1.. Immunization and in vitro antigen stimulation affect network architecture.
(A) The number of edges in networks formed by the 1000 most abundant CDR3 sequences in three TCR datasets: 12 naïve mice; 5 mice immunized with peptide p277 (HSP60 437–460 VLGGGCALLRCIPALDSLTPANED) emulsified in Complete Freund’s Adjuvant (CFA); and 5 mice immunized with p277+CFA whose splenic T cells were stimulated in-vitro with peptide p277. (B) The number of edges in networks formed by the 1000 most abundant CDR3 sequences in four TCR datasets: 12 naïve mice; 5 mice immunized with OVA 323–339 peptide (ISQAVHAAHAEINEAGR) in CFA; 3 mice immunized with OVA+CFA whose splenic T cells were stimulated in-vitro with the same OVA peptide; and 5 mice immunized with OVA+CFA whose splenic T cells were analyzed 2 months post-immunization. DOI:
http://dx.doi.org/10.7554/eLife.22057.019
Figure 5—figure supplement 2.. Mouse TCR Networks become less connected with aging.
A comparison of network clusters in young and aged mice. Network representations of the 1000 most frequent clones in (A) young and (B) aged mice. The networks composed of the 1000 most frequent clones in the young mice (n = 3) manifested 590.3 ± 61.9 clustered nodes with 992.7 ± 147.4 edges. In contrast, networks composed of the 1000 most frequent clones in the aged mice (n = 3) had 334.7 ± 63.5 clustered nodes with 362.3 ± 153.8 edges. Nodes are colored according to the sharing level of their corresponding CDR3 sequence in the 28 mice reference dataset. DOI:
http://dx.doi.org/10.7554/eLife.22057.020
Figure 5—figure supplement 3.. Human TCR Networks become less connected with aging.
A comparison of network connectivity formed by the thousand most frequent CDR3 AA segments expressed in 39 humans at different ages (data from Britanova et al. (2014). The Mean degree was calculated for each human sample and colored according to 4 age groups: 6–25, 34–43, 61–66, and 71–90 years. DOI:
http://dx.doi.org/10.7554/eLife.22057.021
Figure 5—figure supplement 4.. With aging, the repertoire becomes more skewed and spread in sequence space due to preferential expansion of private clones at the expense of more public clones.
Frequency of the top 1000 most frequent CDR3 sequences by sharing level for young (6–8 weeks, n = 3) and aged (17–20 months, n = 3) mice. DOI:
http://dx.doi.org/10.7554/eLife.22057.022
Figure 5—figure supplement 5.. CTLA4 blockade results in a repertoire that is more skewed and spread in sequence space, due to preferential expansion of private clones at the expense of more public clones.
The cumulative frequency (in %) of relatively private CDR3 sequences from the top 1000 most frequent sequences in the repertoires of patients pre and post CTLA4 blockade treatment with tremelimumab (Robert et al., 2014). Sharing was defined by comparison with a reference dataset of CDR3 sequences from 11 young healthy individuals (Britanova et al., 2014): Relatively private sequences were defined as CDR3 sequences shared by 0–5 individuals out of 11 in the reference dataset, where 0 indicates a sequence not found in any of the 11 individuals in the reference cohort. There is a significant increase in the frequency of relatively private sequences (p-value=0.01947, ranked Wilcox paired test). DOI:
http://dx.doi.org/10.7554/eLife.22057.023
Similar articles
- A new high-throughput sequencing method for determining diversity and similarity of T cell receptor (TCR) α and β repertoires and identifying potential new invariant TCR α chains.
Kitaura K, Shini T, Matsutani T, Suzuki R. Kitaura K, et al. BMC Immunol. 2016 Oct 11;17(1):38. doi: 10.1186/s12865-016-0177-5. BMC Immunol. 2016. PMID: 27729009 Free PMC article. - T-cell receptor repertoires share a restricted set of public and abundant CDR3 sequences that are associated with self-related immunity.
Madi A, Shifrut E, Reich-Zeliger S, Gal H, Best K, Ndifon W, Chain B, Cohen IR, Friedman N. Madi A, et al. Genome Res. 2014 Oct;24(10):1603-12. doi: 10.1101/gr.170753.113. Epub 2014 Jul 14. Genome Res. 2014. PMID: 25024161 Free PMC article. - Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences.
Yohannes DA, Kaukinen K, Kurppa K, Saavalainen P, Greco D. Yohannes DA, et al. BMC Bioinformatics. 2021 Mar 25;22(1):159. doi: 10.1186/s12859-021-04087-7. BMC Bioinformatics. 2021. PMID: 33765908 Free PMC article. - Comparative analysis of murine T-cell receptor repertoires.
Izraelson M, Nakonechnaya TO, Moltedo B, Egorov ES, Kasatskaya SA, Putintseva EV, Mamedov IZ, Staroverov DB, Shemiakina II, Zakharova MY, Davydov AN, Bolotin DA, Shugay M, Chudakov DM, Rudensky AY, Britanova OV. Izraelson M, et al. Immunology. 2018 Feb;153(2):133-144. doi: 10.1111/imm.12857. Epub 2017 Nov 27. Immunology. 2018. PMID: 29080364 Free PMC article. Review. - Sequence analysis of T-cell repertoires in health and disease.
Woodsworth DJ, Castellarin M, Holt RA. Woodsworth DJ, et al. Genome Med. 2013 Oct 30;5(10):98. doi: 10.1186/gm502. eCollection 2013. Genome Med. 2013. PMID: 24172704 Free PMC article. Review.
Cited by
- Interpretable GWAS by linking clinical phenotypes to quantifiable immune repertoire components.
Tan Y, Wang L, Zhang H, Pan M, Liu DJ, Zhan X, Li B. Tan Y, et al. Commun Biol. 2024 Oct 20;7(1):1357. doi: 10.1038/s42003-024-07010-x. Commun Biol. 2024. PMID: 39428403 Free PMC article. - Human TCR repertoire in cancer.
Chen L, Hu Y, Zheng B, Luo L, Su Z. Chen L, et al. Cancer Med. 2024 Sep;13(17):e70164. doi: 10.1002/cam4.70164. Cancer Med. 2024. PMID: 39240157 Free PMC article. Review. - Long-lived central memory γδ T cells confer protection against murine cytomegalovirus reinfection.
Yared N, Papadopoulou M, Barennes P, Pham HP, Quiniou V, Netzer S, Kaminski H, Burguet L, Demeste A, Colas P, Mora-Charrot L, Rousseau B, Izotte J, Zouine A, Gauthereau X, Vermijlen D, Déchanet-Merville J, Capone M. Yared N, et al. PLoS Pathog. 2024 Jul 8;20(7):e1010785. doi: 10.1371/journal.ppat.1010785. eCollection 2024 Jul. PLoS Pathog. 2024. PMID: 38976755 Free PMC article. - Single-cell analysis of bronchoalveolar cells in inflammatory and fibrotic post-COVID lung disease.
Mehta P, Sanz-Magallón Duque de Estrada B, Denneny EK, Foster K, Turner CT, Mayer A, Milighetti M, Platé M, Worlock KB, Yoshida M, Brown JS, Nikolić MZ, Chain BM, Noursadeghi M, Chambers RC, Porter JC, Tomlinson GS. Mehta P, et al. Front Immunol. 2024 May 17;15:1372658. doi: 10.3389/fimmu.2024.1372658. eCollection 2024. Front Immunol. 2024. PMID: 38827740 Free PMC article. - Local Enrichment with Convergence of Enriched T-Cell Clones Are Hallmarks of Effective Peptide Vaccination against B16 Melanoma.
Izosimova AV, Shabalkina AV, Myshkin MY, Shurganova EV, Myalik DS, Ryzhichenko EO, Samitova AF, Barsova EV, Shagina IA, Britanova OV, Yuzhakova DV, Sharonov GV. Izosimova AV, et al. Vaccines (Basel). 2024 Mar 22;12(4):345. doi: 10.3390/vaccines12040345. Vaccines (Basel). 2024. PMID: 38675728 Free PMC article.
References
- Britanova OV, Putintseva EV, Shugay M, Merzlyak EM, Turchaninova MA, Staroverov DB, Bolotin DA, Lukyanov S, Bogdanova EA, Mamedov IZ, Lebedev YB, Chudakov DM. Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling. The Journal of Immunology. 2014;192:2689–2698. doi: 10.4049/jimmunol.1302064. - DOI - PubMed
- Burrows SR, Silins SL, Khanna R, Burrows JM, Rischmueller M, McCluskey J, Moss DJ. Cross-reactive memory T cells for Epstein-Barr virus augment the alloresponse to common human leukocyte antigens: degenerate recognition of Major histocompatibility complex-bound peptide by T cells and its role in alloreactivity. European Journal of Immunology. 1997;27:1726–1736. doi: 10.1002/eji.1830270720. - DOI - PubMed
Publication types
MeSH terms
Substances
Grants and funding
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials