RAG1 core and V(D)J recombination signal sequences were derived from Transib transposons - PubMed (original) (raw)

RAG1 core and V(D)J recombination signal sequences were derived from Transib transposons

Vladimir V Kapitonov et al. PLoS Biol. 2005 Jun.

Abstract

The V(D)J recombination reaction in jawed vertebrates is catalyzed by the RAG1 and RAG2 proteins, which are believed to have emerged approximately 500 million years ago from transposon-encoded proteins. Yet no transposase sequence similar to RAG1 or RAG2 has been found. Here we show that the approximately 600-amino acid "core" region of RAG1 required for its catalytic activity is significantly similar to the transposase encoded by DNA transposons that belong to the Transib superfamily. This superfamily was discovered recently based on computational analysis of the fruit fly and African malaria mosquito genomes. Transib transposons also are present in the genomes of sea urchin, yellow fever mosquito, silkworm, dog hookworm, hydra, and soybean rust. We demonstrate that recombination signal sequences (RSSs) were derived from terminal inverted repeats of an ancient Transib transposon. Furthermore, the critical DDE catalytic triad of RAG1 is shared with the Transib transposase as part of conserved motifs. We also studied several divergent proteins encoded by the sea urchin and lancelet genomes that are 25%-30% identical to the RAG1 N-terminal domain and the RAG1 core. Our results provide the first direct evidence linking RAG1 and RSSs to a specific superfamily of DNA transposons and indicate that the V(D)J machinery evolved from transposons. We propose that only the RAG1 core was derived from the Transib transposase, whereas the N-terminal domain was assembled from separate proteins of unknown function that may still be active in sea urchin, lancelet, hydra, and starlet sea anemone. We also suggest that the RAG2 protein was not encoded by ancient Transib transposons but emerged in jawed vertebrates as a counterpart of RAG1 necessary for the V(D)J recombination reaction.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Schematic Presentation of Transib transposons, RAG1, RAG2, and RAG1-Like Proteins in Eukaryotes

The basic timescale of the evolutionary tree is based on published literature [49–51]. Red circles mark species in which Transib TPases were found. Gray squares indicate RAG2; orange and blue ellipses show the RAG1 core and RAG1 N-terminal domain, respectively. Overall taxonomy, including common and Latin names, is reported on the right side of the figure. A question mark at the lamprey lineage indicates insufficient sequence data. A lack of any labels means that the Transib TPase and RAG1/2 are not present in the sequenced portions of the corresponding genomes. Among branches lacking Transib TPases, only lamprey and crocodile genomes are not extensively sequenced to date. In sea anemone, the RAG1 core–like protein is capped by the ring finger motif, which also forms the C-terminus in the RAG1 N-terminal domain. In fungi, the Transib TPase was detected in soybean rust only.

Figure 2

Figure 2. Diversity of the Transib TPases and RAG1 Core–Like Proteins in Animals

The phylogenetic tree was obtained by using the neighbor-joining algorithm implemented in MEGA [44]. Evolutionary distance for each pair of protein sequences was measured as the proportion of aa sites at which the two sequences were different. Its scale is shown by the horizontal bar. Bootstrap values higher than 60% are reported at the corresponding nodes. Species abbreviations are as follows: AA, yellow fever mosquito; AG, African malaria mosquito; BF, lancelet; CL, bull shark; DP, D. pseudoobscura fruit fly; FR, fugu fish; HM, hydra; HS, human; NV, starlet sea anemone; SP, sea urchin; XL, frog. (Transib1 through Transib5 are from D. melanogaster fruit fly).

Figure 3

Figure 3. Multiple Alignment of Ten Conserved Motifs in the RAG1 Core Proteins and Transib TPases

The motifs are underlined and numbered from 1 to 10. Starting positions of the motifs immediately follow the corresponding protein names. Distances between the motifs are indicated in numbers of aa residues. Black circles denote conserved residues that form the RAG1/Transib catalytic DDE triad. The RAG1 proteins are as follows: RAG1_XL (GenBank GI no. 2501723, Xenopus laevis, frog), RAG1_HS (4557841, Homo sapiens, human), RAG1_GG (131826, Gallus gallus, chicken), RAG1_CL (1470117, Carcharhinus leucas, bull shark), RAG1_FR (4426834, Fugu rubripes, fugu fish). Coloring scheme [43] reflects physiochemical properties of amino acids: black shading marks hydrophobic residues, blue indicates charged (white font), positively charged (red font), and negatively charged (green font); red indicates proline (blue font) and glycine (green font); gray indicates aliphatic (red font) and aromatic (blue font); green indicates polar (black font) and amphoteric (red font); and yellow indicates tiny (blue font) and small (green font). The species abbreviations for the Transib transposons are as follows: AA, yellow fever mosquito; AG, African malaria mosquito; DP, D. pseudoobscura fruit fly. (Transib1 through Transib5 are from the fruitfly D. melanogaster).

Figure 4

Figure 4. Structural Similarities between the Transib TIRs and V(D)J RSS Signals

The species abbreviations are: AA, yellow fever mosquito; AG, African malaria mosquito; DM, D. melanogaster fruit fly DP, D. pseudoobscura fruit fly; SP, sea urchin. (Transib1 through Transib5 are from the fruit fly D. melanogaster). (A) Frequencies of the most frequent nucleotides at each position of the consensus sequence of the 5′ TIRs of transposons that belong to 20 families of Transib transposons identified in fruit flies and mosquitoes. The RSS23 consensus sequence is shown immediately under the TIRs consensus sequence. The most conserved nucleotides in the RSS23 heptamer and nonamer, which are necessary for efficient V(D)J recombination, are highlighted. The 23 ± 1 bp variable spacer is marked by Ns. (B) Non-gapped alignment of consensus sequences of 5′ TIRs from 21 families of Transib transposons_._ (C) The 12/23 rule follows from the basic structure of TIRs of the consensus sequences of transposons that belong to the Transib5, Transib2_AG, TransibN1_AG, TransibN2_AG, and TransibN3_AG families. The 5′ TIRs of these transposons are aligned with the corresponding 3′ TIRs. Structures of the 5′ and 3′ TIRs resemble RSS12 and RSS23, respectively.

Figure 5

Figure 5. Schematic Structure of the Sea Urchin RAG1-Like Sequences

Contig accession numbers are shown in the left column. Inverted complement contigs are marked by “c” followed by the contig number. In each contig, RAG1-like proteins (white rectangle) are schematically aligned with the human RAG1 core (top rectangle). Nucleotide positions of the RAG1-like sequences are shown beneath the white rectangles. Three pairs of recently duplicated sequences (nucleotide identity is higher than 95%) are underlined by red, green, and black lines, respectively. Transposable and repetitive elements detected in the flanking regions are marked by painted rectangles. Names of these elements are shown above the rectangles. Asterisks denote stop codons in the corresponding RAG1-like sequences. BLASTP E-values characterizing similarities between the sea urchin and RAG1 proteins are shown above the white rectangles. Multiple alignment of these protein sequences is reported in Figure S5.

Figure 6

Figure 6. Multiple Alignment of the RAG1 N-Terminal Domain and Sea Urchin Protein Sequences

RAG1_HS, RAG1_PD, RAG1_SS, RAG1_RM, and RAG1_LM mark the human (GenBank accession number NP_000439), lungfish (AAS75810), pig (BAC54968), stripe-sided rhabdornis or Rhabdornis mysticalis bird (AAQ76078), and latimeria (AAS75807) proteins, respectively. The sea urchin and lancelet proteins are marked by “_SP” and “_BF” following the identification numbers of the corresponding contigs. Protein sequences assembled from the sea urchin and lancelet WGS Trace Archives are denoted as P4-P5_SP and P1-P5_BF, respectively. Three conserved motifs are underlined and numbered. The third conserved motif is known as the ring finger. Distances from the protein N-termini are indicated by numbers.

Similar articles

Cited by

References

    1. Tonegawa S. Somatic generation of antibody diversity. Nature. 1983;302:575–581. - PubMed
    1. Oettinger MA, Schatz DG, Gorka C, Baltimore D. RAG-1 and RAG-2, adjacent genes that synergistically activate V(D)J recombination. Science. 1990;248:1517–1523. - PubMed
    1. Gellert M. V(D)J recombination: RAG proteins, repair factors, and regulation. Annu Rev Biochem. 2002;71:101–132. - PubMed
    1. Sakano H, Huppi K, Heinrich G, Tonegawa S. Sequences at the somatic recombination sites of immunoglobulin light-chain genes. Nature. 1979;280:288–294. - PubMed
    1. Akira S, Okazaki K, Sakano H. Two pairs of recombination signals are sufficient to cause immunoglobulin V-(D)-J joining. Science. 1987;238:1134–1138. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources