T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences (original) (raw)

Abstract

Diversity of T cell receptor (TCR) repertoires, generated by somatic DNA rearrangements, is central to immune system function. However, the level of sequence similarity of TCR repertoires within and between species has not been characterized. Using network analysis of high-throughput TCR sequencing data, we found that abundant CDR3-TCRβ sequences were clustered within networks generated by sequence similarity. We discovered a substantial number of public CDR3-TCRβ segments that were identical in mice and humans. These conserved public sequences were central within TCR sequence-similarity networks. Annotated TCR sequences, previously associated with self-specificities such as autoimmunity and cancer, were linked to network clusters. Mechanistically, CDR3 networks were promoted by MHC-mediated selection, and were reduced following immunization, immune checkpoint blockade or aging. Our findings provide a new view of T cell repertoire organization and physiology, and suggest that the immune system distributes its TCR sequences unevenly, attending to specific foci of reactivity.

DOI: http://dx.doi.org/10.7554/eLife.22057.001

Research Organism: Human, Mouse

Introduction

The T-cell receptor (TCR), which is generated through random rearrangement of genomic V-D-J segments, is the mediator of specific antigen recognition by T lymphocytes. The collective variety of these receptors expressed by an individual, the TCR repertoire, reflects the state of the adaptive immune system and its history, as its composition changes throughout life in response to immune challenges. The individual TCR repertoire is shaped by biases in the process of VDJ recombination (Robins et al., 2010; Miles et al., 2011; Murugan et al., 2012; Ndifon et al., 2012), and by the subsequent expansion and deletion of certain T cell clones upon antigen recognition during T cell development in the thymus, and later in the periphery.

Here, we studied the organization of TCR repertoires using high-throughput TCR sequencing, comparing data from mice and humans. We focused on the CDR3 (complementary determining region 3) amino acid (AA) sequence of the TCRβ chain, which is the most diverse segment of the TCR and is positioned to interact with the antigenic peptide epitope presented by an MHC molecule (Davis and Bjorkman, 1988). The organization of TCR repertoires of individual mice and humans was evaluated using network analysis, where CDR3 sequences were connected based on their level of sequence similarity.

Results

Initially, we constructed TCR networks from a dataset of TCRβ AA sequences obtained from splenic CD4+ T cells from 12 healthy C57BL/6 mice (Madi et al., 2014). We obtained on average about 30,000 different CDR3 sequences from each mouse, which were found at varying abundances and had an average length of 13.4 ± 1.4 (mean ±SD) AA. Figure 1A shows a network obtained using the thousand most frequent CDR3 sequences from a single mouse, which in terms of abundance correspond to 34% of the total sequences obtained for that mouse. CDR3 sequences (nodes) were connected (by edges) if they were separated by one amino acid difference (replacement/addition/ deletion of one AA) – a Levenshtein distance of 1(Levenshtein, 1966). A cluster was defined as a set of two or more nodes that are connected to each other by any number of edges and intermediate nodes (Figure 1A, inset). A similar analysis had previously revealed the existence of networks of B-cell immunoglobulin heavy-chains, which were attributed to clonally derived sequences generated by somatic hyper-mutations (SHM) (Ben-Hamo and Efroni, 2011; Bashford-Rogers et al., 2013). Our analysis demonstrated the existence of networks also for TCRβ sequences. As T cells do not undergo SHM, other factors lead to the formation of TCR similarity networks.

Figure 1. Mouse and human TCR repertoires manifest dense similarity networks surrounding public CDR3β sequences.

(A) Networks formed by the thousand most frequent CDR3 AA sequences expressed in the TCRβ repertoire of splenic CD4 T cells from a single mouse. Nodes (CDR3 AA sequences) were connected by edges defined by a Levenshtein distance of 1 (one AA substitution/insertion/ deletion). Node size reflects its log frequency (scale at the bottom). The nodes are colored according to their sharing levels in a reference dataset of 28 mice (Madi et al., 2014), from Private CDR3 sequences (white, found in only one mouse in the reference dataset) to public (black, shared by all 28 mice). Inset shows a blowup of the marked cluster with labeled CDR3β AA sequences (nodes) and edges which represent a Levenshtein distance of 1 between connected nodes. (B) Networks formed by a thousand CDR3β sequences randomly chosen from the repertoire of a single mouse. (C) A Network formed by the thousand most frequent CDR3 AA sequences in the TCRβ repertoire of a representative human subject (data from [Britanova et al., 2014]). Nodes are colored by their degree of sharing among the 11 young subjects in that study (ages 6–25 years). (D) Mean degree of node connectivity as a function of sharing level in a network formed by the top 1000 CDR3 sequences (blue) or by 1000 randomly chosen sequences (orange). Error bars indicate standard error (SE) across the 12 mice used in this study.

DOI: http://dx.doi.org/10.7554/eLife.22057.002

Figure 1—figure supplement 1. Mean number of clustered nodes as a function of the sample size selected for generating the network.

(Right panel is a zoomed-in version of the left panel). Results are shown for 4 representative conditions, with different levels of observed network connectivity, as expressed by the number of clustered nodes (degree >0). These graphs show that regardless of sample size, (A, B) networks from a naïve mouse are the most connected, followed by those of immunized (p277), aged mice, and lastly p277 in vitro stimulation, which is the least connected. (C, D) networks for 39 human samples (Britanova et al., 2014) divided into 4 age groups. Above ~1000 sequences, the trend is linear; hence the relative fraction of clustered nodes is not sensitive to sample size. Thus, our analysis of network connectivity is not sensitive to the number of sequences used.

Figure 1—figure supplement 2. CDR3β sequences form networks with clusters dominated by J-genes and heterogeneous for V-genes.

An example of a network constructed from the 1000 most abundant CDR3β AA sequences from a single mouse. Both panels show the same network. In the left panel, nodes are colored by the dominating J-gene; in the right panel color indicates the dominating V-gene for each AA sequence. Network clusters mostly consist of a single J-gene, with only a few clusters featuring two or three primary J-genes (left). In contrast, V-gene usage in clusters is heterogeneous, with no obvious dominating gene segment (right). This pattern of clusters with homogenous J-gene and heterogeneous V-gene usage was consistent in all top 1000 CDR3β AA sequence networks we examined.

Figure 1—figure supplement 3. CD8+ T cell networks formed by the thousand most frequent CDR3 AA sequences expressed in two mice.

Nodes (CDR3 AA sequences) were connected by edges defined by a Levenshtein distance of 1.

Figure 1—figure supplement 4. Networks from C3H.HeSnJ mouse strain bearing the H2k MHC haplotype.

CD4+ T cell networks formed by the thousand most frequent CDR3 AA segments expressed in two mice. Nodes (CDR3 AA sequences) were connected by edges defined by a Levenshtein distance of 1.

The mean betweenness centrality is presented as a function of the sharing level in the dataset of 28 mice, for networks composed of the 1000 most frequent CDR3 AA sequences and for networks composed of 1000 randomly selected CDR3 AA sequences from the dataset. Error bars indicate standard error (SE) across the 12 mice used in this study.

TCRβ repertoires of 11 healthy young human subjects previously investigated by Britanova et al. (2014). Shown is the mean degree of nodes as a function of their sharing level in the dataset, for networks composed of the 1000 most frequent CDR3 aa sequences and for networks composed of 1000 randomly selected sequences. Note that public human TCRs manifest a higher degree of connectivity than do private TCRs.

We repeated this analysis for all 12 mice, and found that of the thousand most frequent CDR3 sequences in each mouse (with an accumulated frequency of 34.5 ± 8% of total sequences), 647 ± 104 (mean ±SD) were clustered, with 1282 ± 383 edges. In contrast, networks composed of a thousand randomly selected CDR3 sequences from a single mouse (with an accumulated frequency of 5 ± 0.7% of total sequences) were much sparser (Figure 1B), with only 225 ± 64 sequences clustered, and with 152 ± 52 edges (average values for 10 independent randomized sets of sequences). These results were not sensitive to the number of sequences used for the analysis (Figure 1—figure supplement 1).

To contrast the TCR networks with their BCR counterparts, we tested whether these networks are structurally similar. BCR networks have been shown to center around highly abundant clones, representing a snapshot of the individual-specific local evolution driven by SHM. However, we found no correlation (R2 = 0.11 ± 0.07) between the abundance of a TCR CDR3 sequence and its degree of connectivity in the network (number of edges connecting it to other sequences). We further found that each cluster typically contained sequences of a single (or in some cases two) specific J segment (Figure 1—figure supplement 2). V usage, in contrast, was not cluster-specific; any cluster contained sequences with many different V segments (Figure 1—figure supplement 2). This reflects the higher number of V segments compared with J segments, as well as their lower overlap with CDR3 and the relative similarity of their 3’ ends. Networks of similar connectivity were obtained also for the top 1000 CDR3β sequences from CD8 T cells, and for CD4 T cells of a different mouse strain (C3H.HeSnJ), that bears a different MHC haplotype (H2k; Figure 1—figure supplement 3, Figure 1—figure supplement 4).

We found a parallel network organization also in human TCRβ repertoires: we analyzed previously published data containing the TCRβ repertoires of 39 human subjects of different ages (Britanova et al., 2014), and found that the most abundant CDR3 sequences formed connected clusters in human TCR repertoires (Figure 1C, Supplementary file 1, and Figure 1—figure supplement 1), though with a lower connectivity than that found in the similarity networks of inbred mice. From the thousand most frequent CDR3 sequences (accumulated frequency of 17.1 ± 6.6% of total sequences) in each of the 11 young human subjects in that study (ages 6–25 years), 207 ± 79 nodes were clustered, with 367 ± 201 edges. Networks composed of randomly selected sequences from the individual subjects generated only 8 ± 4 clustered nodes with 4 ± 2 edges. We thus conclude that these newly discovered TCR similarity networks are likely to be driven by conserved evolutionary forces, as opposed to BCR networks that are generated by SHM that operates within individuals.

Next, we tested whether these TCR networks reflect our previous finding that TCRβ CDR3 AA sequences express a range of sharing levels between individual mice. As a measure of sharing level, we used a reference dataset of 28 mice (Madi et al., 2014) and assigned to each CDR3 AA sequence in a network a sharing level ranging from 1 (private, found in only one mouse in the reference dataset) to 28 (public, found in all 28 mice in the reference dataset) (Madi et al., 2014). Interestingly, we found a strong association between the sharing level of a CDR3 sequence and its connectivity in the network: highly shared sequences are positioned at the center of network clusters (Figure 1A). This is indicated by a statistically significant correlation between the degree of node connectivity (number of edges connecting it to other nodes in the network) and its sharing level (Figure 1D), (R = 0.69 ± 0.03, p-value<2.2e-16; see also Supplementary file 1). An independent method for estimation of node centrality, betweenness centrality, confirmed the correlation between CDR3 sharing and centrality for the 1000 most abundant CDR3 sequences, but not for a random set of expressed sequences (Figure 1—figure supplement 5, Supplementary file 1). As in mice, public CDR3 sequences in humans manifested a higher degree of connectivity than did more private sequences (Figure 1C, Figure 1—figure supplement 6), and sequence abundance was not correlated with its level of connectivity (Supplementary file 1). Thus, private and public CDR3 sequences are distributed differently across the mouse and human networks: public sequences are highly connected to other similar sequences and are more central in network clusters; in contrast, more private sequences are found at the edges of clusters, or as un-connected nodes, with rare similarity to other sequences in the network.

These findings of a similar organization of mouse and human TCR networks prompted us to look for the existence of shared CDR3β sequences between the two species. Interestingly, we found that a substantial number of TCRβ CDR3 AA sequences were shared by mice and humans. Out of 5,247,785 unique AA sequences in the human dataset (11 young individuals) and 371,977 in the mouse dataset (28 animals), 27,337 were shared by at least one mouse and one human individual. In general, CDR3 sequences with a higher level of sharing in mice were found to have an increased probability of being found in human repertoires; similarly, sequences more shared in humans were found more frequently in mice (Figure 2A, Figure 2—figure supplement 1). Of note, more than 25% of the public CDR3 sequences (found in all 11 young human subjects, or found in all 28 mice) were found also in at least one individual of the other species (Figure 2A).

Figure 2. TCR repertoires are focused around public and cross species- (CS-) public CDR3 AA sequences shared by mice and humans.

(A) Human (left) or mouse (right) CDR3 sequences are grouped according to their sharing level in the corresponding dataset. For each sharing group, we plotted the percentage of sequences that were shared by at least one subject of the other species. (B) Examples of CS-Public CDR3 sequences, and their V and J segments in mouse and human repertoires. (C) A network formed by the top 1000 CDR3 sequences of a single human subject. Node color represents its sharing within or between species: Pink - shared by all 11 human subjects; Green - shared by at least 25 of the 28 mice; Black – CS-public nodes shared by all 11 humans and at least 25 mice; Blue - not shared. (D) The mean number of edges per node (degree) in the 11 human and 28 mouse networks, subdivided into the four categories as in C. Error bars mark SE.

DOI: http://dx.doi.org/10.7554/eLife.22057.009

(A) All CDR3β sequences in the 28 mouse dataset were categorized according to their sharing level, from private (found in only one mouse), to public (found in all 28 mice). The graph presents the percent of sequences within each category that were also found in the human dataset (in at least 1 of 11 young subjects). (B) All CDR3β sequences in the 11 young human subjects were categorized according to their sharing level, from private (found in only one subject), to public (found in all subjects). The graph presents the percent of sequences within each group that were also found in at least one of the 28 mice. In both cases, the fraction of cross-species sequences increases with the sharing level; sequences that are more public in one species are more frequently found in the other species.

We generated 100 datasets of simulated human and mouse repertoires, with number of individuals (11 humans, 28 mice) and repertoire sizes as in the experimental data. For each of the 86 observed CS-public sequences, we plot its mean sharing level in the simulations, for human repertoires (red) and mouse (blue) repertoires . The top panel shows 54 sequences that are CS-public in both experiment and simulations. The lower panel shows 32 sequences that are CS-public in the experimental data but not in the simulations. Note that there were additionally about 200 CS-public sequences in the simulations which were not CS-public in the data.

Figure 2—figure supplement 3. CS-Public CDR3 sequences are central in mouse TCRβ networks.

Shown is a representative network of the 1000 most frequent sequences from a mouse. Nodes are labeled according to 4 categories: CDR3 sequences that are not public; CDR3 sequences shared by all 11 human samples; CDR3 sequences shared by at least 25 mice; CDR3 sequences shared by at least 25 mice and all 11 humans.

Figure 2—figure supplement 4. Degree of CS-public sequences is correlated in mouse and human TCR networks.

Each dot represents one CS-public sequence that is found among the most abundant 1000 sequences in at least one mouse and at least one human subject (n = 45 sequences). There is a significant correlation between the degree of CS-public sequences in the two species (R = 0.65, spearman); Sequences that are more connected in one species are typically more connected in the other as well.

We defined a set of cross-species (CS) public CDR3 sequences that were public or relatively public in both mice (found in at least 25 of the 28 mice) and humans (found in all 11 young individuals). All these 86 CS-public sequences contained the human Jβ2.7 or Jβ2.3 segments, and the mouse Jβ2.5 or Jβ2.7 segments. V usage was dominated by Vβ20.1 in humans, but a more diverse V usage was observed in mice. Examples of CS-public sequences are shown in Figure 2B. The CS-public CDR3 sequences manifested a significantly higher degree of connectivity in human and mouse networks than did CDR3 sequences that were public only in humans, only in mice or not public in either (Figure 2C,D and Figure 2—figure supplement 2). Moreover, we found a significant correlation between the mean degrees of CS-public sequences in mouse and human networks (Figure 2—figure supplement 3); CS-public sequences that have more neighbors in mouse networks also tended to have more neighbors in human networks, suggesting an evolutionarily conserved network structure. We note that while CS-public sequences are central in network clusters, their frequency is not higher than that of other public sequences that are found only in humans or in mice. These findings propose that similar driving forces may generate and expand particular public CDR3 TCR sequences that contain conserved sequence motifs in the two species.

To further characterize the mechanisms that contribute to the generation of CS-public sequences, we evaluated their existence in synthetic TCR repertoires that simulate the random generation of TCR sequences (see methods). These simulations do not include any clonal selection, thus they allow discrimination between genetic mechanisms that influence the generation of TCRs and selection mechanisms that shape it somatically. We generated 100 datasets of simulated repertoires of 28 mice and 11 humans, the sizes of which matched the sizes of the experimental repertoires. The simulated repertoires contained a somewhat larger number of CS-public CDR3 sequences than observed in the experimental data (average of 221 ± 9 in the simulations, vs. 86 in the data). The simulated CS-public sequences contained the same restricted set of mouse and human J segments, which are highly similar between the two species (J2.7 mouse and human; J2.5 mouse/J2.3 human). Thus, sequence homology of J segments contributes to the formation of CS-public TCRs, but is not sufficient by itself, and is accompanied by other mechanisms that induce bias in the recombination process (e.g. biased V segment usage, statistics of nucleotide deletions and insertions at V-D and D-J junctions). We also asked whether the simulated repertoires contained the same CS-public sequences as those observed experimentally. We found that 54 out of the 86 experimentally observed CS-public sequences were identical to simulated CS-public sequences, while 32 were not CS-public in the simulations (Figure 2—figure supplement 4). The partial overlap between simulations and data may result from inaccuracies in the assumptions of the simulations regarding the random TCR generation process, or indicate that selection mechanisms in the thymus and in the periphery further influence the existence of specific CS-public sequences.

We further evaluated the similarity between public sequences by analyzing the level of connectivity within a network composed of the most highly shared CDR3 sequences. A network formed by the 1000 most public mouse sequences (found in >25 of the 28 mice) was highly connected, with 965 clustered nodes and 3387 edges (Figure 3A). In contrast, networks formed by the 1000 most abundant private sequences (found in only one of the 28 mice) were very sparse, manifesting only 38 ± 15 clustered nodes and 20 ± 7 edges (mean ± SD, averaged over 28 mice). Similarly, a network formed by the 1000 most public human CDR3 sequences was also highly connected (with 969 clustered nodes and 4398 edges, Figure 3B).

Figure 3. Public CDR3 sequences form highly connected similarity networks in mice and humans and are enriched for self-associated immune reactivities.

(A) A network formed by the 1000 most shared mouse CDR3 sequences (found in >25 of 28 mice). Node size corresponds to the mean abundance of the sequence. Nodes are colored according to their cluster association. 124 CDR3 sequences that were previously annotated (see [Madi et al., 2014]) were added to the network and are presented as arrowheads. 63 annotated sequences were either identical to, or at a Levenshtein distance of 1 from one of the nodes, and are listed next to each cluster (with the corresponding color). Annotations of 61 un-clustered sequences are also listed. (B) A network formed by the 1000 most frequent public CDR3 sequences in humans (found in all 11 subjects). Previously annotated mouse (n = 124) and human (n = 30) CDR3 sequences were added to the network as in A (arrowheads). The clusters were distinctly colored in order to visually match between clusters and their annotated sequences, not to define antigen specificity of a cluster. A list of linked annotated CDR3 sequences is shown next to each cluster (11 of 30 human and 23 of 124 mouse annotated CDR3 sequences), together with a list of unclustered annotated human sequences.

DOI: http://dx.doi.org/10.7554/eLife.22057.014

Figure 3—figure supplement 1. Public CDR3 sequences form highly connected similarity networks in mice and are enriched for self-associated immune reactivities.

Sequence visualization of the red (top right) cluster in the mouse CDR3 sequences network shown in Figure 3A. The original full network is formed by the 1000 most shared mouse CDR3 sequences (found in >25 of 28 mice). 124 CDR3 sequences that were previously annotated (see [Madi et al., 2014]) were added to the network and are presented as red arrowheads. 13 annotated sequences were either identical to, or at a Levenshtein distance of 1 from one of the nodes in this cluster, and their associated pathology/antigen is listed next to the corresponding node.

The functional TCR is formed by a complex of TCR alpha and beta chains (Davis and Bjorkman, 1988), hence one cannot attribute specific antigen recognition to CDR3β segments alone. Moreover, the current level of understanding precludes the development of general predicting tools that can computationally relate a TCR sequence to an antigen that it recognizes. Defining TCR antigen specificity is further complicated by substantial TCR cross-reactivity (Burrows et al., 1997; Wooldridge et al., 2012). Yet, TCRβ sequences that bind the same pMHC antigen do contain shared CDR3β sequence motifs (Klinger et al., 2015; Chen et al., 2017; Sun et al., 2017; Tickotsky et al., 2017). Thus, some insight on antigen specificity can be gained by linking the sequence-similarity networks to previously annotated TCR sequences. We have reported that 124 of the CDR3β sequences in our mouse dataset were associated with various mouse immune reactivities previously described in the literature (Madi et al., 2014). As a step towards relating antigen specificity to the clusters of public CDR3 sequences, we looked for these 124 annotated CDR3β sequences within the clusters of shared CDR3 sequences. The annotated sequences were grouped according to four categories: a) Immunity to foreign pathogens; b) Allograft reactions; c) Tumor-associated T cells; and d) Autoimmune conditions. Figure 3A includes these annotations in the network formed by the 1000 most public CDR3β sequences. Out of the 124 annotated sequences, 63 were either identical to one of the existing nodes (n = 11), or linked to an existing node by a Levenshtein distance of 1 (n = 52). The clustered annotated nodes were found to be enriched with annotations related to self or self-like autoimmune, cancer or allograft reactions (self-related: 51/63 = 81% of network-clustered sequences vs. 85/124 = 69% in all 124 annotated sequences, compared to non-self: 12/63 = 19% in clusters vs. 39/124 = 31%; Fisher exact test p=0.0035).

We find that sequences with a similar annotation tended to be linked in the same cluster. Examples include twelve sequences of tumor infiltrating regulatory T cells (Sainz-Perez et al., 2012) which were found in cluster #2; six COPD related CDR3 sequences (Motz et al., 2008) in cluster #6; and four CDR3 sequences connected with cluster #2 that were associated with type 1 diabetes in NOD mice in two different studies (Nakano et al., 1991; Tikochinski et al., 1999). However, different annotations can also be found in the same cluster (Figure 3A); for example, mouse CDR3 sequences associated with experimental autoimmune encephalomyelitis (EAE; [Menezes et al., 2007]) and collagen-induced arthritis (CIA; [Osman et al., 1993]) were also connected to cluster #2. Figure 3B shows that many previously annotated self/self-like sequences of humans and mice were also linked to clusters in the network of public human sequences. Thus, the CDR3 clusters, which serve as repertoire foci, seem to be enriched with TCR sequences that are associated with self (or self-like) reactivities, whereas pathogen-associated TCR sequences are less clustered and so tend to be more evenly spread throughout sequence space.

To analyze mechanisms involved in network formation, we investigated the contribution of antigen selection using two complimentary approaches. First, we analyzed similarity networks formed by CDR3 sequences of CD4-CD8-double-negative (DN) thymocytes. Rearranged TCRβ chains in DN cells are not subject to MHC-dependent selection, which only occurs at later stages of thymic development. We found that networks formed by DN CDR3 sequences were significantly less connected compared to splenic CD4+ T cells, which have undergone antigen selection (Figure 4A and Supplementary file 2). In addition, DN thymocytes and CD4+ spleen T cells manifested different levels of convergent recombination (Venturi et al., 2006, 2008). Public CDR3 AA sequences in DN thymocytes were encoded on average by a low number of nucleotide (nt) sequences, whereas the same AA sequences were encoded by a much larger number of nt sequences in CD4+ splenic T cells (Figure 4C, Figure 4—figure supplement 1). The finding of relatively increased network clusters in T cells that have undergone antigen selection suggests that the CDR3 AA sequences that are found within clusters are positively selected; this antigen selection would extend any underlying physical bias generated during TCR DNA recombination in the thymus (Murugan et al., 2012; Ndifon et al., 2012).

Figure 4. MHC-dependent public CDR3 sequences form highly connected similarity networks.

(A) Mean number of clustered nodes in networks formed by the top 1000 CDR3 sequences from the following repertoires: DN thymocytes (CD4−CD8−) (n = 3), CD4+ spleen T cells (n = 3), Quad-KO mice(Van Laethem et al., 2007) (lack MHC-I, MHC–II, CD4 and CD8) (n = 4), and their WT controls (C57BL/6) (n = 4). Error bars signify standard error. (B) Cumulative frequency of the 86 CS-public CDR3 sequences (observed in the reference datasets of 28 WT mice and 11 healthy humans) is shown for: DN thymocytes (CD4-CD8-) (n = 3), CD4+ spleen T cells (n = 3) (left), Quad-KO mice (n = 4), and their WT controls (C57BL/6) (n = 4). Error bars signify standard error. (C) Cumulative frequency of nucleotide sequences coding for two annotated (C9 and COPD, top) and two unknown (bottom) public AA CDR3 sequences from repertoires of DN thymocytes and CD4+ spleen T cells (sequences from 3 mice are shown). Each color represents a different nucleotide sequence.

DOI: http://dx.doi.org/10.7554/eLife.22057.016

Figure 4—figure supplement 1. DN thymocytes manifest lower convergent recombination.

Comparison of the number of nt sequences encoding, on average, an AA CDR3 sequence, for public CDR3 AA sequences, found to be shared by more than 25 out of 28 mice in the reference dataset. Public CDR3 sequences coming from DN thymocytes were encoded on average by a lower number of nucleotide (nt) sequences compared to those from CD4+ splenic T cells (p<2.2e-16 for each of these top sharing levels).

To further study the impact of selection, we evaluated TCR networks formed in the repertoires of splenic T cells from mice lacking four elements needed for physiological MHC-dependent antigen selection: MHC-I and -II molecules together with CD4 and CD8 co-receptor molecules, so-called Quad-KO mice (Van Laethem et al., 2007, 2013). In contrast to wild-type (WT) mice, the TCR of Quad-KO mice are selected by MHC-independent ligands in the thymus and their T cells express a diverse MHC-independent TCR repertoire in the periphery (Van Laethem et al., 2007; Tikhonova et al., 2012; Van Laethem et al., 2013). We found that similarity networks formed by the top 1000 CDR3 sequences from Quad-KO mice were significantly less connected than those of the WT strain (C57BL/6) measured in the same set of experiments (Figure 4A and Supplementary file 2). Together, these findings indicate that MHC-dependent thymic selection plays a significant role in promoting the formation of dense clusters of TCR-similarity networks. Lack of MHC-dependent selection in DN thymocytes and in Quad-KO mice is associated with TCR networks of reduced connectivity; in contrast, TCRs that are subject to MHC selection form dense networks with a higher level of convergent recombination. Thus, recombination biases combined with clonal selection generate a TCR repertoire that is not uniform, but rather focused in specific regions of sequence space that are preferentially associated with self-related antigen-reactivities.

Following these observations, we tested if the relative abundance of CS-public clonotypes is increased by MHC-dependent selection. To this end, we compared the frequency of CS-public sequences in repertoires of Quad-KO mice and DN thymocytes to those of control WT mice (Figure 4B). The cumulative frequencies of the CS-public CDR3 sequences between two sets of experiments done with WT mice (the 28 WT mice used in the network analysis, and the WT mice used as controls in the Quad-KO experiment) show no significant difference (P value = 0.293). On the other hand, the Quad-KO repertoires exhibited lower total frequency of the CS-public CDR3s compared with both 28 WT mice (P value = 4.318e-09) and the Quad-WT mice (P value = 0.01781). The cumulative frequency in the DN shows a similar trend, with no statistical significant (P value = 0.1877). Together, these results indicate that, although sequence homology of V and J germline segments between mice and humans and bias in the recombination process influence the probability for a sequence to be shared between the two species, additional selection forces are influencing its abundance.

Since the composition of the TCR repertoire of an individual changes in response to immune challenges throughout life, we tested the effects of both immunization and aging on the network organization of the TCR repertoire. We immunized naïve mice with p277, a self peptide derived from HSP60 (heat shock protein 60), or with a foreign peptide, derived from ovalbumin (OVA). Peptide p277 was previously found to be recognized by the C9 public TCR in NOD mice (Tikochinski et al., 1999), and the CDR3β sequence of the C9 clone was also public in C57BL/6 mice (Madi et al., 2014). Additionally, we analyzed the network structures in the TCR repertoires of T cells from the immunized mice that were further cultured in vitro with antigen presenting cells loaded with the specific peptide. The distribution of sequence abundances and repertoire evenness were evaluated using the Gini inequality coefficient, which ranges from 0 for a repertoire where every sequence is present in equal abundance, to 1 for a repertoire dominated by a single sequence, with other sequences present at zero abundance (Bashford-Rogers et al., 2013; Thomas et al., 2013).

We found that immunization with either peptide resulted in repertoires that contained a set of expanded CDR3 sequences and had an increased abundance inequality. In vitro re-stimulation further increased inequality (Figure 5A–C and Supplementary file 3). This inequality was associated with the emergence of private clones that dominated the post-immunization repertoire, such that the relative weight of public clones was reduced (Figure 5E). Interestingly, immunization was also associated with network disruption; the number of clustered nodes and the number of edges both fell after immunization in vivo and fell further after in vitro re-stimulation (Figure 5D, Figure 5—figure supplement 1). Both the increased inequality and the decreased network connectivity reversed spontaneously in the OVA-immunized mice 2 months following immunization (Figure 5D,E (right), Figure 5—figure supplement 1). Similar to immunization, repertoires in aged mice (Figure 5F, Figure 5—figure supplement 2) and in aged humans (Figure 5G, Figure 5—figure supplement 3) were more unequal and less connected than those of young individuals, and private CDR3 sequences became relatively more abundant with age (Figure 5—figure supplement 4). Altogether, we found a strong anti-correlation between the Gini Coefficient of TCR inequality and the number of connected nodes in TCR networks in mice (Figure 5F, Spearman correlation = −0.661) and in humans (Figure 5G, Spearman correlation = −0.865).

Figure 5. Immunization, in vitro antigen re-stimulation, anti-CTLA4 antibody treatment and aging perturb TCR networks coupled with an increase in repertoire skewness.

(A–C) Networks of the thousand most frequent CDR3 sequences are shown for (A) a naïve mouse, (B) a mouse Immunized with a self-peptide (p277), and (C) T cells from the spleen of an immunized mouse, which were re-stimulated in vitro with the p277 peptide. (D) Mean number of clustered nodes in networks formed by the top 1000 CDR3 sequences from the following repertoires: Left: naïve mice (n = 12); p277 immunized mice, 7d post immunization (n = 5); and in-vitro re-stimulated with p277 (n = 5). Right: naïve mice (n = 12); OVA immunized mice, 7d post immunization (n = 5); in-vitro re-stimulated with OVA peptide (n = 3); and immunized mice, 2 months post-immunization (n = 5). Error bars indicate standard error. (E) Frequency of the top 1000 most frequent CDR3 sequences by sharing level, for the same repertoires as in (D). Sharing levels were calculated based on sharing in the reference dataset of 28 mice. (F) The Gini Coefficient (a measure for repertoire evenness) plotted vs. the number of clustered nodes, for the top 1000 CDR3 sequences from the repertoires from (D, E) and from aged mice (n = 3). (G) The Gini Coefficient plotted vs. the number of clustered nodes for 39 human samples (Britanova et al., 2014) divided into 4 age groups. (H) The number of clustered nodes (left) and the number of public clonotypes (right, shared by all 11 young human samples in a reference cohort [Britanova et al., 2014]) for the top 1000 most abundant CDR3 sequences in 21 paired samples of patients at baseline and 30 to 60 days after receiving CTLA4 blockade treatment with tremelimumab (data from [Robert et al., 2014]). (I) Number of public clonotypes (defined as in H) out of the top 1000 most abundant CDR3 sequences in either healthy donors (left) or Juvenile Idiopathic Arthritis patients (right). (J) A conceptual figure of the evolution of repertoire structure. In young and healthy individuals the repertoire is focused and even (top-right), with public and CS-public CDR3 sequences at the center of network clusters. Following an immune response, or with aging, the repertoire becomes more skewed and spread in sequence space (bottom-left), due to preferential expansion of private clones at the expense of more public clones.

DOI: http://dx.doi.org/10.7554/eLife.22057.018

Figure 5—figure supplement 1. Immunization and in vitro antigen stimulation affect network architecture.

(A) The number of edges in networks formed by the 1000 most abundant CDR3 sequences in three TCR datasets: 12 naïve mice; 5 mice immunized with peptide p277 (HSP60 437–460 VLGGGCALLRCIPALDSLTPANED) emulsified in Complete Freund’s Adjuvant (CFA); and 5 mice immunized with p277+CFA whose splenic T cells were stimulated in-vitro with peptide p277. (B) The number of edges in networks formed by the 1000 most abundant CDR3 sequences in four TCR datasets: 12 naïve mice; 5 mice immunized with OVA 323–339 peptide (ISQAVHAAHAEINEAGR) in CFA; 3 mice immunized with OVA+CFA whose splenic T cells were stimulated in-vitro with the same OVA peptide; and 5 mice immunized with OVA+CFA whose splenic T cells were analyzed 2 months post-immunization.

Figure 5—figure supplement 2. Mouse TCR Networks become less connected with aging.

A comparison of network clusters in young and aged mice. Network representations of the 1000 most frequent clones in (A) young and (B) aged mice. The networks composed of the 1000 most frequent clones in the young mice (n = 3) manifested 590.3 ± 61.9 clustered nodes with 992.7 ± 147.4 edges. In contrast, networks composed of the 1000 most frequent clones in the aged mice (n = 3) had 334.7 ± 63.5 clustered nodes with 362.3 ± 153.8 edges. Nodes are colored according to the sharing level of their corresponding CDR3 sequence in the 28 mice reference dataset.

Figure 5—figure supplement 3. Human TCR Networks become less connected with aging.

A comparison of network connectivity formed by the thousand most frequent CDR3 AA segments expressed in 39 humans at different ages (data from Britanova et al. (2014). The Mean degree was calculated for each human sample and colored according to 4 age groups: 6–25, 34–43, 61–66, and 71–90 years.

Figure 5—figure supplement 4. With aging, the repertoire becomes more skewed and spread in sequence space due to preferential expansion of private clones at the expense of more public clones.

Frequency of the top 1000 most frequent CDR3 sequences by sharing level for young (6–8 weeks, n = 3) and aged (17–20 months, n = 3) mice.

Figure 5—figure supplement 5. CTLA4 blockade results in a repertoire that is more skewed and spread in sequence space, due to preferential expansion of private clones at the expense of more public clones.

The cumulative frequency (in %) of relatively private CDR3 sequences from the top 1000 most frequent sequences in the repertoires of patients pre and post CTLA4 blockade treatment with tremelimumab (Robert et al., 2014). Sharing was defined by comparison with a reference dataset of CDR3 sequences from 11 young healthy individuals (Britanova et al., 2014): Relatively private sequences were defined as CDR3 sequences shared by 0–5 individuals out of 11 in the reference dataset, where 0 indicates a sequence not found in any of the 11 individuals in the reference cohort. There is a significant increase in the frequency of relatively private sequences (p-value=0.01947, ranked Wilcox paired test).

Another factor that impacted network structure was immune checkpoint blockade. We used published CDR3β sequence data (Robert et al., 2014) from subjects who had undergone CTLA4 (cytotoxic T–lymphocyte-associated protein 4) blockade with tremelimumab. Previous analysis of these data showed that this treatment diversified the peripheral T-cell pool. Applying TCR similarity network analysis, we now show that the 1000 most abundant CDR3 sequences after check-point blockade are less connected than pre-treatment (p value<0.05 ranked Wilcox paired test, Figure 5H left); moreover, this reduction in connectivity was detected concurrently with a decrease in the number of public CDR3 sequences and an increase in the frequency of private ones (p-value=0.01947, ranked Wilcox paired test, Figure 5H right, Figure 5—figure supplement 5). Thus, broadening of the peripheral repertoire following CTLA4 blockade reduces the presence of public clones and enhances the expansion of private clones, similar to the changes we observed in aging or after immunization. This finding raises the possibility that check-point associated immune regulation also could be involved in the prominence of network connectivity of public T cells. Finally, we analyzed TCR repertoires of patients with the autoimmune disease Juvenile Idiopathic Arthritis (JIA)(Henderson et al., 2016). We found that there was a strong increase of public (network promoting) TCRs in the peripheral blood of JIA patients compared to healthy donors (P value = 0.0006, Figure 5I). Thus, while immune perturbations such as immunization and aging lead to reduced levels of public clonotypes and a reduction in network connectivity, this specific autoimmune condition is associated with an increased level of public clones which are putatively associated with self-antigens.

Discussion

Our application of network analysis to TCRβ CDR3 sequencing data reveals a hitherto unrecognized structure of the TCR repertoire in both mice and humans: In young, healthy individuals, the most abundant TCRβ CDR3 sequences are distributed unevenly in sequence-space, with clusters centered around public CDR3s, and in particular around CS-public sequences, which are public both in mice and humans (Figure 5J top-right, even and focused repertoire). The clustering of the most abundant CDR3 sequences in young and healthy individuals results in a repertoire that is much more restricted than would be expected from the random process of TCR somatic recombination. This basic network architecture is modified by immunization and aging due to the dominant expansion of more private CDR3 clonotypes. Thus, public CDR3s that serve as hubs of the TCR networks become less prominent, leading to reduced connectivity of the TCR networks combined with a more skewed repertoire (Figure 5J bottom-left, skewed and spread repertoire). We find that network organization and repertoire evenness are restored with the resolution of immune responses. It might be the case that incomplete resolution of immune responses throughout life lead to accumulation of changes in the TCR repertoire that eventually result in the skewed and spread (less clustered) repertoires that we observe in aged individuals. Interestingly, TCR repertoires from patients with the autoimmune condition JIA showed increased levels of public TCR sequences. This aligns with our observation that public TCR networks are enriched with self-associated TCRs. Taken together, our analysis supports the idea that the level of network connectivity, frequency of public TCRs and repertoire evenness are linked to each other, and are concurrently modulated by the individual’s immune state (disease/immunization/ aging).

Mechanistically, we found that MHC-dependent antigen selection contributes to the formation of dense networks, since reduced network connectivity was observed in pre-selection DN thymocytes and also by inhibiting MHC-dependent selection, in the Quad-KO mice. These results can be explained by preferential selection and increased survival, in both the thymus and periphery, of T cells that carry specific CDR3 sequences that recognize self-antigens presented by MHC molecules. Different T cell clones, which carry different CDR3 nt sequences but encode the same AA sequence, would appear to enjoy a common selective advantage and accumulate in the peripheral repertoire. This mechanism can explain our observations of increased convergent recombination in splenic CD4+ T cells compared to DN thymocytes (Figure 4—figure supplement 1). Antigen selection can also account for the enhanced network connectivity of TCRs that differ by one AA in their CDR3 sequences; such related CDR3 sequences may be selected by the same peptide-MHC complex, albeit with different affinities (Moss et al., 1991; Serana et al., 2009; Zoete et al., 2013). This working hypothesis needs to be tested experimentally to see if linked CDR3 sequences really cross-react with the same or similar peptide-MHC complexes. MHC-antigen selection of public CDR3 sequences takes place on a background of biases in the biophysical process of DNA recombination (Elhanati et al., 2014). Combined, these processes lead to the formation of dense network clusters of the most abundant public TCR sequences, as we report here. In contrast, the most abundant private TCR sequences generate poorly connected networks. B cell receptor (BCR) sequences (Ben-Hamo and Efroni, 2011; Bashford-Rogers et al., 2013), unlike the T-cell repertoire networks we disclose here, have long been known to generate networks in individual subjects by affinity maturation that is mediated by SHM; T cells do not undergo SHM so TCR networks must be generated in the developmental process. Thus, dominant and public T cell clonotypes have a higher sequence similarity than non-dominant and private ones. In contrast, BCR networks have a distinct structure resulting from the SHM process, in which abundance and degree are correlated, which is not the case in TCR networks.

Our finding that TCR CDR3 networks include identical and related sequences that are not confined to individuals but are shared by most individuals of the same species and even cross the species divide between mice and humans, suggests the likelihood of some fundamental evolutionary advantage in such sequences. As noted above, antigen specificity of a TCR cannot be defined based on its CDR3β alone. However, the same or very similar CDR3β sequences are frequently observed within repertoires of T cells specific for a given antigen, in combination with flexible or preferential pairing with TCRα (Klinger et al., 2015; Chen et al., 2017; Tickotsky et al., 2017). Hence, we hypothesize that T cell clones bearing the conserved, CS-public, CDR3 sequences recognize similar antigenic epitopes that are conserved across species. These antigens may be derived from evolutionarily conserved regions of self proteins, forming a core of T cell reactivities to specific self epitopes, with potential implications for self-maintenance, autoimmunity and cancer. Further studies relating TCRα, TCRβ and peptide specificity will enable to experimentally test this hypothesis.

Our results indicate that T lymphocytes ‘focus their attention’ to specific regions in sequence space. These new findings on the organization of TCR repertoires and their dynamics raise intriguing questions, for example, does the existence of network clusters indicate a healthy immune state? Can restoration of network structure reinstate immune function in the elderly or prevent excess inflammation and autoimmune disease? The theory of the immunological homunculus composed of self-recognizing B cells and T cells (Cohen, 1992, 2000) might be relevant here.

Materials and methods

Mice

Female 5–8 weeks old C57BL/6 mice were obtained from Harlan Laboratories. Analysis of TCR sequences from aged mice is based on data that was previously described in Shifrut et al. (2013). Analysis of TCR sequences from repertoires which are not subject to MHC-dependent selection, is based on Quad-KO mice, which are lacking four elements needed for physiological MHC-dependent antigen selection: MHC-I and -II molecules together with CD4 and CD8 co-receptor molecules, and matched control WT mice (Van Laethem et al., 2007, 2013) and DN thymocytes, which represent the landscape of generated TCRs before thymic selection.

Human data used in this study

Dataset of 39 healthy Caucasian donors, ages 6–90 years, was obtained from Britanova et al. (2014) (Robert et al., 2014). CTLA4 blockade data was obtained from Robert et al. (2014). Juvenile Idiopathic Arthritis (JIA) data of patients compared to healthy donors was obtained from Henderson et al. (2016).

Immunization and in vitro stimulation

Mice were injected intra-peritonealy (IP) with 100 μg of either Chicken Ovalbumin (OVA) or peptide 277 (p277) emulsified in CFA (1:1 ratio). Spleens were harvested on day 7 post immunization and T cells were extracted for TCR analysis. in vitro stimulation: T cells from spleens of immunized mice were harvested on day 7 and were re-stimulated with irradiated splenocytes and the relevant peptide antigen. Five of the OVA-immunized mice received a boost IP injection of 100 μg OVA + CFA on day 14, and spleens were harvested on day 60 for TCR analysis (Supplementary file 3).

Library preparation for TCR-seq and data pre-processing

Libraries were prepared and pre-processed as published (Ndifon et al., 2012). Briefly, T cells were purified from splenocytes by magnetic bead separation, total RNA was extracted and reverse transcribed using a TCR Cβ-specific primer linked to the 3'-end Illumina sequencing adapter. cDNA was amplified using PCR with a Cβ−3’adpater primer and a set of 20 Vβ-specific 5’ primers, followed by ligation of a 5’Illumina adaptor and a second PCR using universal primers for the 5’ and 3’ Illumina adapters. The libraries were sequenced using Genome Analyzer II or HiSeq 2000 (Illumina). Sequence filtering, VDJ annotation, normalization and translation to AA sequences were performed as published (Ndifon et al., 2012). Libraries for TCR-seq of Quad mice and C57BL/6 controls were sequenced using Illumina sequencers, performed by Adaptive Biotechnologies Corp (Seattle, WA). In brief, αβT cells were isolated by cell sorting, washed in PBS and lysed in Trizol. RNA was extracted using the RNEasy protocol (Qiagen) and 2 µg per sample reverse transcribed to cDNA by oligo (dT) priming with the SuperScript TM III First-Strand Synthesis System (Invitrogen). cDNA was sequenced by Adaptive Biotechnologies Corp.

Statistical analysis and visualization

Statistical analysis was performed using R Software (Core Team, 2013). We used the following packages: ‘ShortRead’ (Morgan et al., 2009) for the pre-processing pipeline; ‘ineq’ (Zeileis, 2012) and ‘reldist’ (Handcock, 2014) to calculate the Gini coefficient; ‘Igraph’ (Csardi and Nepusz, 2006) to create network objects, obtain the degree of a node and its betweeness; ‘stringdist’ (van der Loo, 2014) to calculate Levenshtein distances; and ‘ggplot2’ (Wickham, 2009) for generating figures. Statistical tests performed are stated in the text. All network figures were made using Cytoscape (http://www.cytoscape.org/) (Cline et al., 2007; Smoot et al., 2011; Saito et al., 2012).

Data access

The sequence data from this study have been made publicly available (https://usegalaxy.org/u/erezgrn/h/network-tcrs).

Acknowledgements

We thank Benjamin Chain and Shalev Itzkovitz for helpful comments on the manuscript. This research was supported by grants from the Minerva Foundation with funding from the Federal German Ministry for Education and Research and the I-CORE Program of the Planning and Budgeting Committee and the Israel Science Foundation. AM was supported by the MD Moross Institute for Cancer Research.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Funding Information

This paper was supported by the following grants:

M.D. Moross Institute for Cancer Reseach to Asaf Madi.
Minerva Foundation Funding from the Federal German Ministry for Education and Research to Nir Friedman.
I-CORE Program of the Planning and Budgeting Committee and the Israel Science Foundation to Nir Friedman.

Additional information

Competing interests

The authors declare that no competing interests exist.

Author contributions

AM, Conceptualization, Formal analysis, Investigation, Writing—original draft.

AP, Conceptualization, Formal analysis, Investigation, Writing—original draft.

ES, Data curation, Formal analysis, Investigation, Writing—review and editing.

SR-Z, Investigation, Methodology, Writing—review and editing.

EG, Formal analysis, Investigation.

IZ, Investigation, Methodology.

TA, Formal analysis, Investigation.

FVL, Resources, Formal analysis, Investigation.

AS, Resources, Supervision, Investigation, Methodology.

JL, Resources, Investigation, Methodology.

PDS, Resources, Formal analysis, Investigation, Methodology.

IRC, Conceptualization, Supervision, Writing—original draft.

NF, Conceptualization, Supervision, Funding acquisition, Writing—original draft.

Ethics

Animal experimentation: This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All of the animals were handled according to approved institutional animal care and use committee (IACUC) protocols (#24110116-2) of the Weizmann Institute of Science. The protocol was approved by the Committee on the Ethics of Animal Experiments of the Weizmann Institute of Science. Every effort was made to minimize suffering.

Additional files

Supplementary file 2. Summary of the data for the quad-KO mice, which are lacking four elements needed for physiological MHC-dependent antigen selection: MHC-I and -II molecules together with CD4 and CD8 co-receptor molecules (Van Laethem et al., 2007, 2013), and matched control WT mice.

Connected.nodes and edges refers to network statistics generated from the 1000 most frequent CDR3 sequences in each mouse.

DOI: http://dx.doi.org/10.7554/eLife.22057.025

Supplementary file 3. Summary of TCR-seq data used in this study, from 5 experimental conditions: (1) mice that were immunized with either Chicken Ovalbumin (OVA) or (2) peptide 277 (p277), of HSP60.

Spleens were harvested on day 7 post immunization and T cells were extracted for TCR analysis. (3) in vitro stimulation: T cells from spleens of immunized mice were harvested on day 7 and were re-stimulated with irradiated splenocytes and the relevant peptide antigen. (4) Five of the OVA-immunized mice received a boost IP injection of 100 μg OVA + CFA on day 14, and spleens were harvested on day 60 for TCR analysis. (5) DN thymocytes.

DOI: http://dx.doi.org/10.7554/eLife.22057.026

Major datasets

The following previously published dataset was used:

Friedman N,2015,Young mice TCR repertoire,https://www.ncbi.nlm.nih.gov/sra/SRP042610,Publicly available at NCBI Sequence Read Archive (accession no: SRP042610)

References

Bashford-Rogers RJ, Palser AL, Huntly BJ, Rance R, Vassiliou GS, Follows GA, Kellam P. Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations. Genome Research. 2013;23:1874–1884. doi: 10.1101/gr.154815.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ben-Hamo R, Efroni S. The whole-organism heavy chain B cell repertoire from zebrafish self-organizes into distinct network features. BMC Systems Biology. 2011;5:27. doi: 10.1186/1752-0509-5-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
Britanova OV, Putintseva EV, Shugay M, Merzlyak EM, Turchaninova MA, Staroverov DB, Bolotin DA, Lukyanov S, Bogdanova EA, Mamedov IZ, Lebedev YB, Chudakov DM. Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling. The Journal of Immunology. 2014;192:2689–2698. doi: 10.4049/jimmunol.1302064. [DOI] [PubMed] [Google Scholar]
Burrows SR, Silins SL, Khanna R, Burrows JM, Rischmueller M, McCluskey J, Moss DJ. Cross-reactive memory T cells for Epstein-Barr virus augment the alloresponse to common human leukocyte antigens: degenerate recognition of Major histocompatibility complex-bound peptide by T cells and its role in alloreactivity. European Journal of Immunology. 1997;27:1726–1736. doi: 10.1002/eji.1830270720. [DOI] [PubMed] [Google Scholar]
Chen G, Yang X, Ko A, Sun X, Gao M, Zhang Y, Shi A, Mariuzza RA, Weng NP. Sequence and structural analyses reveal distinct and highly diverse human CD8(+) TCR repertoires to immunodominant viral antigens. Cell Reports. 2017;19:569–583. doi: 10.1016/j.celrep.2017.03.072. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, Hanspers K, Isserlin R, Kelley R, Killcoyne S, Lotia S, Maere S, Morris J, Ono K, Pavlovic V, Pico AR, Vailaya A, Wang PL, Adler A, Conklin BR, Hood L, Kuiper M, Sander C, Schmulevich I, Schwikowski B, Warner GJ, Ideker T, Bader GD. Integration of biological networks and gene expression data using cytoscape. Nature Protocols. 2007;2:2366–2382. doi: 10.1038/nprot.2007.324. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cohen IR. The cognitive principle challenges clonal selection. Immunology Today. 1992;13:441–444. doi: 10.1016/0167-5699(92)90071-E. [DOI] [PubMed] [Google Scholar]
Cohen IR. Tending Adam’s Garden: Evolving the Cognitive Immune Self. London: Academic Press; 2000. [Google Scholar]
Core Team R. R: A Language and Environment for Statistical Computing. Vienna: 2013. [Google Scholar]
Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal, Complex Systems. 2006;1695 [Google Scholar]
Davis MM, Bjorkman PJ. T-cell antigen receptor genes and T-cell recognition. Nature. 1988;334:395–402. doi: 10.1038/334395a0. [DOI] [PubMed] [Google Scholar]
Elhanati Y, Murugan A, Callan CG, Mora T, Walczak AM. Quantifying selection in immune receptor repertoires. PNAS. 2014;111:9875–9880. doi: 10.1073/pnas.1409572111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Henderson LA, Volpi S, Frugoni F, Janssen E, Kim S, Sundel RP, Dedeoglu F, Lo MS, Hazen MM, Beth Son M, Mathieu R, Zurakowski D, Yu N, Lebedeva T, Fuhlbrigge RC, Walter JE, Nee Lee Y, Nigrovic PA, Notarangelo LD. Next-Generation sequencing reveals restriction and clonotypic expansion of Treg cells in Juvenile Idiopathic Arthritis. Arthritis & Rheumatology. 2016;68:1758–1768. doi: 10.1002/art.39606. [DOI] [PMC free article] [PubMed] [Google Scholar]
Klinger M, Pepin F, Wilkins J, Asbury T, Wittkop T, Zheng J, Moorhead M, Faham M. Multiplex identification of Antigen-Specific T cell receptors using a combination of immune assays and immune receptor sequencing. PLoS One. 2015;10:e0141561. doi: 10.1371/journal.pone.0141561. [DOI] [PMC free article] [PubMed] [Google Scholar]
Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady. 1966;10:707–710. [Google Scholar]
Madi A, Shifrut E, Reich-Zeliger S, Gal H, Best K, Ndifon W, Chain B, Cohen IR, Friedman N. T-cell receptor repertoires share a restricted set of public and abundant CDR3 sequences that are associated with self-related immunity. Genome Research. 2014;24:1603–1612. doi: 10.1101/gr.170753.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Menezes JS, van den Elzen P, Thornes J, Huffman D, Droin NM, Maverakis E, Sercarz EE. A public T cell clonotype within a heterogeneous autoreactive repertoire is dominant in driving EAE. Journal of Clinical Investigation. 2007;117:2176–2185. doi: 10.1172/JCI28277. [DOI] [PMC free article] [PubMed] [Google Scholar]
Miles JJ, Douek DC, Price DA. Bias in the αβ T-cell repertoire: implications for disease pathogenesis and vaccination. Immunology and Cell Biology. 2011;89:375–387. doi: 10.1038/icb.2010.139. [DOI] [PubMed] [Google Scholar]
Morgan M, Anders S, Lawrence M, Aboyoun P, Pagès H, Gentleman R. ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinformatics. 2009;25:2607–2608. doi: 10.1093/bioinformatics/btp450. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moss PA, Moots RJ, Rosenberg WM, Rowland-Jones SJ, Bodmer HC, McMichael AJ, Bell JI. Extensive conservation of alpha and beta chains of the human T-cell antigen receptor recognizing HLA-A2 and influenza A matrix peptide. PNAS. 1991;88:8987–8990. doi: 10.1073/pnas.88.20.8987. [DOI] [PMC free article] [PubMed] [Google Scholar]
Motz GT, Eppert BL, Sun G, Wesselkamper SC, Linke MJ, Deka R, Borchers MT. Persistence of lung CD8 T cell oligoclonal expansions upon smoking cessation in a mouse model of cigarette smoke-induced emphysema. The Journal of Immunology. 2008;181:8036–8043. doi: 10.4049/jimmunol.181.11.8036. [DOI] [PubMed] [Google Scholar]
Murugan A, Mora T, Walczak AM, Callan CG. Statistical inference of the generation probability of T-cell receptors from sequence repertoires. PNAS. 2012;109:16161–16166. doi: 10.1073/pnas.1212755109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nakano N, Kikutani H, Nishimoto H, Kishimoto T. T cell receptor V gene usage of islet beta cell-reactive T cells is not restricted in non-obese diabetic mice. Journal of Experimental Medicine. 1991;173:1091–1097. doi: 10.1084/jem.173.5.1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ndifon W, Gal H, Shifrut E, Aharoni R, Yissachar N, Waysbort N, Reich-Zeliger S, Arnon R, Friedman N. Chromatin conformation governs T-cell receptor jβ gene segment usage. PNAS. 2012;109:15865–15870. doi: 10.1073/pnas.1203916109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Osman GE, Toda M, Kanagawa O, Hood LE. Characterization of the T cell receptor repertoire causing collagen arthritis in mice. Journal of Experimental Medicine. 1993;177:387–395. doi: 10.1084/jem.177.2.387. [DOI] [PMC free article] [PubMed] [Google Scholar]
Handcock MS. Relative distribution methods. 1.6-32014
Robert L, Tsoi J, Wang X, Emerson R, Homet B, Chodon T, Mok S, Huang RR, Cochran AJ, Comin-Anduix B, Koya RC, Graeber TG, Robins H, Ribas A. CTLA4 blockade broadens the peripheral T-cell receptor repertoire. Clinical Cancer Research. 2014;20:2424–2432. doi: 10.1158/1078-0432.CCR-13-2648. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robins HS, Srivastava SK, Campregher PV, Turtle CJ, Andriesen J, Riddell SR, Carlson CS, Warren EH. Overlap and effective size of the human CD8+ T cell receptor repertoire. Science Translational Medicine. 2010;2:47ra64. doi: 10.1126/scitranslmed.3001442. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sainz-Perez A, Lim A, Lemercier B, Leclerc C. The T-cell receptor repertoire of tumor-infiltrating regulatory T lymphocytes is skewed toward public sequences. Cancer Research. 2012;72:3557–3569. doi: 10.1158/0008-5472.CAN-12-0277. [DOI] [PubMed] [Google Scholar]
Saito R, Smoot ME, Ono K, Ruscheinski J, Wang PL, Lotia S, Pico AR, Bader GD, Ideker T. A travel guide to Cytoscape plugins. Nature Methods. 2012;9:1069–1076. doi: 10.1038/nmeth.2212. [DOI] [PMC free article] [PubMed] [Google Scholar]
Serana F, Sottini A, Caimi L, Palermo B, Natali PG, Nisticò P, Imberti L. Identification of a public CDR3 motif and a biased utilization of T-cell receptor V beta and J beta chains in HLA-A2/Melan-A-specific T-cell clonotypes of melanoma patients. Journal of Translational Medicine. 2009;7:21. doi: 10.1186/1479-5876-7-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shifrut E, Baruch K, Gal H, Ndifon W, Deczkowska A, Schwartz M, Friedman N. CD4(+) T Cell-Receptor repertoire diversity is compromised in the spleen but not in the bone marrow of Aged mice due to private and sporadic clonal expansions. Frontiers in Immunology. 2013;4:379. doi: 10.3389/fimmu.2013.00379. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011;27:431–432. doi: 10.1093/bioinformatics/btq675. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sun Y, Best K, Cinelli M, Heather JM, Reich-Zeliger S, Shifrut E, Friedman N, Shawe-Taylor J, Chain B. Specificity, Privacy, and degeneracy in the CD4 T cell receptor repertoire following immunization. Frontiers in Immunology. 2017;8:430. doi: 10.3389/fimmu.2017.00430. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thomas PG, Handel A, Doherty PC, La Gruta NL. Ecological analysis of antigen-specific CTL repertoires defines the relationship between naive and immune T-cell populations. PNAS. 2013;110:1839–1844. doi: 10.1073/pnas.1222149110. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tickotsky N, Sagiv T, Prilusky J, Shifrut E, Friedman N. McPAS-TCR: A manually-curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics. 2017:btx286. doi: 10.1093/bioinformatics/btx286. [DOI] [PubMed] [Google Scholar]
Tikhonova AN, Van Laethem F, Hanada K, Lu J, Pobezinsky LA, Hong C, Guinter TI, Jeurling SK, Bernhardt G, Park JH, Yang JC, Sun PD, Singer A. β T cell receptors that do not undergo Major histocompatibility complex-specific thymic selection possess antibody-like recognition specificities. Immunity. 2012;36:79–91. doi: 10.1016/j.immuni.2011.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tikochinski Y, Elias D, Steeg C, Marcus H, Kantorowitz M, Reshef T, Ablamunits V, Cohen IR, Friedmann A. A shared TCR CDR3 sequence in NOD mouse autoimmune diabetes. International Immunology. 1999;11:951–956. doi: 10.1093/intimm/11.6.951. [DOI] [PubMed] [Google Scholar]
van der Loo M. The stringdist package for approximate string matching. The R Journal. 2014;6:111–122. [Google Scholar]
Van Laethem F, Sarafova SD, Park JH, Tai X, Pobezinsky L, Guinter TI, Adoro S, Adams A, Sharrow SO, Feigenbaum L, Singer A. Deletion of CD4 and CD8 coreceptors permits generation of alphabetaT cells that recognize antigens independently of the MHC. Immunity. 2007;27:735–750. doi: 10.1016/j.immuni.2007.10.007. [DOI] [PubMed] [Google Scholar]
Van Laethem F, Tikhonova AN, Pobezinsky LA, Tai X, Kimura MY, Le Saout C, Guinter TI, Adams A, Sharrow SO, Bernhardt G, Feigenbaum L, Singer A. Lck availability during thymic selection determines the recognition specificity of the T cell repertoire. Cell. 2013;154:1326–1341. doi: 10.1016/j.cell.2013.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Venturi V, Kedzierska K, Price DA, Doherty PC, Douek DC, Turner SJ, Davenport MP. Sharing of T cell receptors in antigen-specific responses is driven by convergent recombination. PNAS. 2006;103:18691–18696. doi: 10.1073/pnas.0608907103. [DOI] [PMC free article] [PubMed] [Google Scholar]
Venturi V, Price DA, Douek DC, Davenport MP. The molecular basis for public T-cell responses? Nature Reviews Immunology. 2008;8:231–238. doi: 10.1038/nri2260. [DOI] [PubMed] [Google Scholar]
Wickham H. Ggplot2: Elegant Graphics for Data Analysis. New York: Springer; 2009. [Google Scholar]
Wooldridge L, Ekeruche-Makinde J, van den Berg HA, Skowera A, Miles JJ, Tan MP, Dolton G, Clement M, Llewellyn-Lacey S, Price DA, Peakman M, Sewell AK. A single autoimmune T cell receptor recognizes more than a million different peptides. Journal of Biological Chemistry. 2012;287:1168–1177. doi: 10.1074/jbc.M111.289488. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeileis A. Ineq: Measuring Inequality, Concentration, and Poverty. 2012. [Google Scholar]
Zoete V, Irving M, Ferber M, Cuendet MA, Michielin O. Structure-Based, rational design of T Cell Receptors. Frontiers in Immunology. 2013;4:268. doi: 10.3389/fimmu.2013.00268. [DOI] [PMC free article] [PubMed] [Google Scholar]

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public sequences" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Arup Chakraborty as the Reviewing Editor and Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The paper presents insightful analyses of T cell receptor sequence repertoires. Network analyses is used to identify clusters of CDR3β sequences, which are over-represented in peripheral T cell repertoires and are often found in multiple individual mice or humans (so called public TCRs). The analysis primarily concerns sequence data that was collected by the authors from a few distinct strains of mice (inbred wild type, plus a couple of immune system knockout strains). In addition, published human repertoire data is used to explore cross-species relevance of results obtained by analyzing mouse data. The authors focus on the most abundant sequences in the repertoire derived from individual animals (roughly the top.3 percent). They then ask whether these abundant sequences are distinguished by any global characteristics.

The authors find that the abundant sequences can largely be decomposed into subsets such that every sequence in a subset is connected to some other sequence in the same subset by at most one substitution, insertion or deletion. The second observation is that the large clusters contain (or have sequences that are one substitution/insertion/deletion away from) members of a group of 124 TCR sequences that have previously been annotated as responders to specific, identifiable, antigens. Interestingly, and enigmatically, the annotated TCR sequences that connect to these clusters mostly have annotations associated with self-reactivity. The authors develop these basic observation in several directions: 1) they show that a group of sequences selected for being present in multiple individual mice (25 out of 28) have similar properties of clustering and association with known antigens; 2) they show that similar clustering and antigen association of abundant TCR sequences occurs in human data and that there is strong overlap between abundant sequences in the two species; 3) they perform related analyses on mice that are knock-out with respect to various elements of the adaptive immune system and show that the sequence cluster organization of abundant sequences is not present if T cell activation is not possible (by knocking out the antigen presenting MHC complex, for example). In other words, that the sequence cluster organization is a product of T cell activation and response. 4) The authors go one to show that T cell selection provides a competitive advantage for selecting T cells that carry the more frequent TCR CDR3 sequences, i.e., T cell selection limits diversity by selecting against thymocytes expressing low frequency, highly variable CDR3bs. In addition, the authors provide evidence that T cell primary responses, CTLA4 checkpoint blockade and aging disrupts the "normal" TCR CDR3b frequencies; i.e., immune response diversifies the hierarchy of T cell CDR3b sequence frequencies presumably by expanding low-frequency T cells specific for particular antigens. The authors speculate that these frequency networks are reflective or perhaps required for proper T cell immune homeostasis.

While the paper is interesting, a number of points need to be addressed.

Major points to be addressed:

There is an over-emphasis on describing the network with minimal provision of primary data; e.g., it would be helpful to provide the actual sequences of each node.
Given the subject matter, there is a general lack of discussion regarding V-D-J recombination. Specifically, the following points need to be clarified:

Preface: human and mouse Db1 and Jb2.7 gene are 100% homologous (Db1 nucleotide, Jb2.7 AA sequence). Human Jb2.3 is highly similar to mouse Jb2.5, identical if 2 AA of the Jb are "chewed back”, which is relatively common during V-D-J rearrangement. Because of this sequence homology, CDR3s made from template-only V-D-J recombination using many Vbs, Db1 and Jb2.7 will by definition be identical in mouse and human. It stands to reason that insertions/deletions of these gene segments during recombination will also generate the identical sequences at a reasonable frequency.

"We discovered an unexpected number of public CDR3-TCRβ segments that were identical in mice and humans." Is this more so than would be expected given the extensive sequence homology between mouse and human Db/Jbs?

"These findings propose that similar driving forces may generate and expand particular public CDR3 TCR sequences that contain conserved sequence motifs in the two species." Given that template only V-D-J recombination of Db1 and Jb2.7 (or Jb2.3) would give identical TCRb CDR3 sequences, isn't sequence homology the evolutionary basis of public CS CDR3s?

Clarity of discussion of how CDR3β sequences relate to antigen specificity of a TCR. This evidence needs to be spelled out a bit in the main text. The reader is referred to other papers, but the point is so important that it would be appropriate to have a self-contained summary exposition in the paper itself.
Given that several aspects of novelty that the authors are claiming are known in other context or are predictable, the authors should directly test their hypothesis that disrupted TCR CDR3β networks are at the minimum a "biomarker" for the disease state; e.g., are there TCR CDR3β network signatures of chronic infection? There are several mouse models (LCMV, TB etc.) or human conditions that could be used as source material.
This point concerns Figure 3A and the discussion around it. Why all the nodes in each cluster are colored in the same way is not clear. Only a few nodes in a given cluster are identical to, or one step away from, one of the 124 annotated TCR sequences. Is the implication of the color scheme that any node in the cluster is expected to be responsive to one of the antigens that are identical (or close to) at least one node in the cluster? The discussion of this point not entirely clear.
In Figure 4 and the surrounding discussion, mention is made of network analysis of repertoires obtained from DN (double negative CD4- DC8-) thymocytes. This data set is not mentioned in Materials and methods, nor is any link to a repository provided. These data are extremely important as they bear on the question whether the highly shared TCR sequences are abundant because of antigen reaction and clonal expansion or due to some other cause. More detailed information about this data set should be given (how many sequences per DN mouse etc.) and, ideally, a pointer to the repository of this data should be given. The data repository should give the nucleotide sequences and not just the amino acid sequences of the CDR3 since the text makes a point of the difference in the number of nt realizations of specific CDR3 aa sequences when comparing the DN mice with the WT mice.

Major points to be addressed:

1) There is an over-emphasis on describing the network with minimal provision of primary data; e.g., it would be helpful to provide the actual sequences of each node.

As it is visually problematic to present the entire network with the actual sequences, we added to Figure 1 in the main text an inset panel that captures one of the main clusters with the AA sequences for each node presented. This example demonstrates the construction method of these networks.

2) Given the subject matter, there is a general lack of discussion regarding V-D-J recombination. Specifically, the following points need to be clarified:

We thank the reviewers for raising this point, which made us refine our statement. In order to answer this question we generated simulated TCR repertoires using V-D-J genes of either mouse or human. Other parameters of the simulations (for example, frequencies of usage of the V-D-J genes, frequencies of random nucleotide insertions/deletions) were determined from data (using “unselected” TCR sequences from the data – those that have a stop codon or are out of frame – about 1-2% of the sequences). We used a similar method in a previous publication: Madi et. al, Genome Research 2014.

We generated 100 datasets of simulated repertoires of 28 mice, the sizes of the repertoires match the sizes of the experimental 28 murine repertoires. For each of these 100 datasets, mouse simulated public sequences were defined as sequences that appear in 25 simulated repertoires or more, as we did for the experimental data). Similarly, we generated 100 datasets of simulated repertoires of 11 human each (simulation parameters were extracted from the data by Britanova et. al 2014, from which we also used the data in the original analysis). Human simulated public sequences were defined as those that appear in all 11 human repertoires in a simulated dataset, as done for the experimental data. Finally, simulated cross-species public sequences were defined based on the overlap between the simulated datasets.

We then tested these cross-species public sequences for J segment usage. Most of these sequences indeed had the mouse Jb2.7 or 2.5, and the human 2.7 or 2.3. Thus, sequence homology between these J segments does contribute to the generation of CS-public sequences. However, this by itself is not a sufficient condition, as there are many sequences that have these J segments, and are public in one species but not in the other (both in the simulations and in the experimental data).

We note that in these simulations, cross-species public sequences were generated at a somewhat higher number than in the experimental data: 221 CS-public on average in the simulations, compared with 86 CS-public in the experimental data. Thus, we don’t use “unexpected” to describe the number of CS-public sequences found. We are aware that these simulations somewhat overestimate the sharing between individual repertoires in both species (as noted in our Genome Research paper). This may stem from inaccuracies in the simulation assumptions, or due to the fact that we simulate only the random generation process of TCRs, but not their selection (in the thymus and in the periphery).

As for the actual sequences, of the 86 CS-public sequences in the data, only 54 are CS-public also in the simulations. The other 32 are found in the simulations at lower sharing levels. This is presented in Figure 2—figure supplement 2. Combined, these results show that the simulations predict the existence of CS-public TCR CDR3β sequences, but not all sequences that are observed as CS-public are also CS-public in the simulation, and vice versa. This suggests that conserved selective forces shape and refine the likelihood of a CDR3β sequence to be CS-public.

To conclude, this analysis suggests that sequence homology together with the parameters of the random generation process drive the generation of CS clones; however their existence is further pruned and shaped by selective forces that are absent from the simulation.

We added in the revised manuscript description of the simulations and their results, and revised the discussion of CS-public clones accordingly.

As a further test for other forces that influence the level of CS-public sequences, we compared the abundance of CS-public sequences in repertoires of the quad-KO mice, which are not subject to MHC-dependent selection, to those of control WT mice. We made a similar comparison also with repertoires of DN thymocytes, which represent the landscape of generated TCRs before thymic selection. The cumulative frequencies of the CS CDR3 sequences between two sets of experiments done with WT mice (the 28 WT mice used in the network analysis, and the WT mice used as controls in the quad-KO experiment) show no significant difference (P value = 0.293). On the other hand, the QuadKO exhibited lower total frequency of the CS CDR3s compared with both 28 WT (P value = 4.318e-09) and the QuadWT (P value = 0.01781). The cumulative frequency in the DN shows a similar trend, with no statistical significant (P value = 0.1877). Together, these results indicate that although sequence homology defines a probability for a sequence to be shared between the two species, additional selection forces are influencing its abundance.

These results were now added to the main text, and as a new panel in Figure 4.

3) Clarity of discussion of how CDR3β sequences relate to antigen specificity of a TCR. This evidence needs to be spelled out a bit in the main text. The reader is referred to other papers, but the point is so important that it would be appropriate to have a self-contained summary exposition in the paper itself.

To expand the discussion of how CDR3β sequences relate to antigen specificity, we added the following to the text, when describing Figure 3: “The functional TCR is formed by a complex of TCR α and β chains (Davis and Bjorkman 1988), hence one cannot attribute specific antigen recognition to CDR3β segments alone. […] Some insight on antigen specificity can be gained by linking the sequence-similarity networks to previously annotated TCR sequences.”

4) Given that several aspects of novelty that the authors are claiming are known in other context or are predictable, the authors should directly test their hypothesis that disrupted TCR CDR3β networks are at the minimum a "biomarker" for the disease state; e.g., are there TCR CDR3β network signatures of chronic infection? There are several mouse models (LCMV, TB etc.) or human conditions that could be used as source material.

The data presented in the manuscript, specifically in Figure 5, introduce various perturbations which are associated with network deterioration. First, we show that exposure and re-exposure of the repertoire to an antigen via immunization leads to network deterioration, due to preferential expansion of relatively private clones at the expense of the more highly connected public clones that support the network structure. This was observed for immunization with a foreign antigen (OVA) and also with a self-antigen (HSP60). We also show that recovery following exposure in the OVA immunized mice restored a naive-like network structure. Most of this analysis in the paper is based on new experiments that we conducted, which were not described in our previous publications.

We further show that network connectivity in human patients was significantly reduced following the strong perturbation associated with immune checkpoint blockade. Lastly, we show a strong correlation between aging, associated with decreased immune functioning, and network connectivity. Together we conclude that network strength is a proxy for immune state.

Following the reviewers’ suggestion, we searched for examples of network structure and sharing modifications in other diseases. We added a new analysis to the revised manuscript, not included in the original manuscript, of patients with the autoimmune disease Juvenile Idiopathic Arthritis (JIA) (Henderson et al. 2016). We found that there was a strong increase of public (network promoting) TCRs in the peripheral blood of JIA patients compared to healthy donors (P value = 0.0006, see figure below). This finding shows that while immune perturbations such as immunization and aging lead to expansion of private clones and network reduction, this specific autoimmune condition is associated with an increased level of public clones which are putatively associated with self-antigens. Taken together, our analysis supports the idea that the level of network connectivity, frequency of public TCRs and repertoire evenness or skewing are linked to each other, and are concurrently modulated by immune state – disease / immunization / aging. However, given the magnitude of the observed effect of disease on network structure, larger datasets of TCR repertoires of sick vs. healthy people are required in order to define network-based features as biomarkers. Further work is required to indicate the conditions / diseases that modulate network structure more than others, and also maybe to refine the populations of T cells used for network generation (e.g. build networks of effector, or memory T cell TCRs). These directions are under investigation by us, beyond the scope of the current manuscript.

We have added to the revised version a panel (Figure 5I), text that describes these results, and also refer to them in the Discussion.

5) This point concerns Figure 3A and the discussion around it. Why all the nodes in each cluster are colored in the same way is not clear. Only a few nodes in a given cluster are identical to, or one step away from, one of the 124 annotated TCR sequences. Is the implication of the color scheme that any node in the cluster is expected to be responsive to one of the antigens that are identical (or close to) at least one node in the cluster? The discussion of this point not entirely clear.

We thank the reviewers for noting this. The color of the clusters was meant only as a way of distinguishing between clusters in the figure, not to define antigen specificity, which is now further clarified in the figure legend. We note (and explain in the text, including specific examples) that clusters can include sequences of different annotations. Thus, clusters are not homogenous in terms of antigen specificity, despite the high level of sequence similarity. Some nodes, in particular close ones, may share the same antigen (as indicated by annotated clonotypes in some cases), but this is not general. This issue is of course further complicated by TCR cross-reactivity. Also, as we note in the text, TCR specificity can depend also on the TCRa part of the receptor. To clarify our notation, we have added a supplemental figure (Figure 3—figure supplement 1), in which we specifically label the nodes (CDR3 sequences) in one large cluster, and identify those that are annotated. We also improved the presentation of annotated colontypes in Figure 3A, and revised the text that discusses this figure.

6) In Figure 4 and the surrounding discussion, mention is made of network analysis of repertoires obtained from DN (double negative CD4- DC8-) thymocytes. This data set is not mentioned in Materials and methods, nor is any link to a repository provided. These data are extremely important as they bear on the question whether the highly shared TCR sequences are abundant because of antigen reaction and clonal expansion or due to some other cause. More detailed information about this data set should be given (how many sequences per DN mouse etc.) and, ideally, a pointer to the repository of this data should be given. The data repository should give the nucleotide sequences and not just the amino acid sequences of the CDR3 since the text makes a point of the difference in the number of nt realizations of specific CDR3 aa sequences when comparing the DN mice with the WT mice.

We regret this omission in our submission. We have added description of this data to the Materials and methods section, added tables that describe these datasets (as well as the quad-KO data, which was also missing) to the supplementary information (no. of sequences per mouse, etc.,) (Supplementary file 2 and Supplementary file 3), and also uploaded the nt data to a public repository (https://usegalaxy.org/u/erezgrn/h/network-tcrs).

Supplementary Materials

Connected.nodes and edges refers to network statistics generated from the 1000 most frequent CDR3 sequences in each mouse.

DOI: http://dx.doi.org/10.7554/eLife.22057.025

Supplementary file 3. Summary of TCR-seq data used in this study, from 5 experimental conditions: (1) mice that were immunized with either Chicken Ovalbumin (OVA) or (2) peptide 277 (p277), of HSP60.

DOI: http://dx.doi.org/10.7554/eLife.22057.026

T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences (original) (raw)

Abstract

Introduction

Results

Figure 1. Mouse and human TCR repertoires manifest dense similarity networks surrounding public CDR3β sequences.

Figure 1—figure supplement 1. Mean number of clustered nodes as a function of the sample size selected for generating the network.

Figure 1—figure supplement 2. CDR3β sequences form networks with clusters dominated by J-genes and heterogeneous for V-genes.

Figure 1—figure supplement 3. CD8+ T cell networks formed by the thousand most frequent CDR3 AA sequences expressed in two mice.

Figure 1—figure supplement 4. Networks from C3H.HeSnJ mouse strain bearing the H2k MHC haplotype.

Figure 1—figure supplement 5. Evaluating the level of node centrality vs. sharing level.

Figure 1—figure supplement 6. Node centrality vs. sharing level in human TCRβ repertoires.

Figure 2. TCR repertoires are focused around public and cross species- (CS-) public CDR3 AA sequences shared by mice and humans.

Figure 2—figure supplement 1. Cross-species TCR sharing.

Figure 2—figure supplement 2. Sharing properties of the 86 observed CS-public CDR3 sequences in simulated data.

Figure 2—figure supplement 3. CS-Public CDR3 sequences are central in mouse TCRβ networks.

Figure 2—figure supplement 4. Degree of CS-public sequences is correlated in mouse and human TCR networks.

Figure 3. Public CDR3 sequences form highly connected similarity networks in mice and humans and are enriched for self-associated immune reactivities.

Figure 3—figure supplement 1. Public CDR3 sequences form highly connected similarity networks in mice and are enriched for self-associated immune reactivities.

Figure 4. MHC-dependent public CDR3 sequences form highly connected similarity networks.

Figure 4—figure supplement 1. DN thymocytes manifest lower convergent recombination.

Figure 5. Immunization, in vitro antigen re-stimulation, anti-CTLA4 antibody treatment and aging perturb TCR networks coupled with an increase in repertoire skewness.

Figure 5—figure supplement 1. Immunization and in vitro antigen stimulation affect network architecture.

Figure 5—figure supplement 2. Mouse TCR Networks become less connected with aging.

Figure 5—figure supplement 3. Human TCR Networks become less connected with aging.

Figure 5—figure supplement 4. With aging, the repertoire becomes more skewed and spread in sequence space due to preferential expansion of private clones at the expense of more public clones.

Figure 5—figure supplement 5. CTLA4 blockade results in a repertoire that is more skewed and spread in sequence space, due to preferential expansion of private clones at the expense of more public clones.

Discussion

Materials and methods

Mice

Human data used in this study

Immunization and in vitro stimulation

Library preparation for TCR-seq and data pre-processing

Statistical analysis and visualization

Data access

Acknowledgements

Funding Statement

Funding Information

Additional information

Competing interests

Author contributions

Ethics

Additional files

Major datasets

References

Supplementary Materials