Shuffling of genes within low-copy repeats on 22q11 (LCR22) by Alu-mediated recombination events during evolution - PubMed (original) (raw)

Shuffling of genes within low-copy repeats on 22q11 (LCR22) by Alu-mediated recombination events during evolution

Melanie Babcock et al. Genome Res. 2003 Dec.

Abstract

Low-copy repeats, or segmental duplications, are highly dynamic regions in the genome. The low-copy repeats on chromosome 22q11.2 (LCR22) are a complex mosaic of genes and pseudogenes formed by duplication processes; they mediate chromosome rearrangements associated with velo-cardio-facial syndrome/DiGeorge syndrome, der(22) syndrome, and cat-eye syndrome. The ability to trace the substrates and products of recombination events provides a unique opportunity to identify the mechanisms responsible for shaping LCR22s. We examined the genomic sequence of known LCR22 genes and their duplicated derivatives. We found Alu (SINE) elements at the breakpoints in the substrates and at the junctions in the truncated products of recombination for USP18, GGT, and GGTLA, consistent with Alu-mediated unequal crossing-over events. In addition, we were able to trace a likely interchromosomal Alu-mediated fusion between IGSF3 on 1p13.1 and GGT on 22q11.2. Breakpoints occurred inside Alu elements as well as in the 5' or 3' ends of them. A possible stimulus for the 5' or 3' terminal rearrangements may be the high sequence similarities between different Alu elements, combined with a potential recombinogenic role of retrotransposon target-site duplications flanking the Alu element, containing potentially kinkable DNA sites. Such sites may represent focal points for recombination. Thus, genome shuffling by Alu-mediated rearrangements has contributed to genome architecture during primate evolution.

PubMed Disclaimer

Figures

Figure 1

Figure 1

(A) Genes in the LCR22s. Four functional genes, USP18 (red), GGT (yellow), GGTLA (green), and BCR (blue) map to LCR22-2, LCR22-8, LCR22-7 and LCR22-6, respectively. Each has become copied during evolution, resulting in a complex pattern within blocks comprising LCR22s (colored blocks corresponding to LCR22 genes, orientation shown). The orientation of the genes and pseudogene copies with respect to the centromere is indicated. (B) Chromosome rearrangement disorders on 22q11.2. The bars under LCR22-2, LCR22-3a, and LCR22-4 depict the intervals harboring the common deletion endpoints, duplication endpoints, and translocation breakpoints in patients with VCFS/DGS, CES, and der(22) syndrome, respectively. (C) Northern blot analysis. We performed a Northern blot analysis using expression sequence tag (EST) DNA probes for USP18, GGT, GGTLA, and BCR. Autoradiograms of human multitissue Northern blots (Clontech) containing heart, brain (whole), placenta, lung, liver, skeletal muscle, kidney, and pancreas tissues were probed with radiolabeled PCR products from ESTs. The USP18, GGT, and BCR probes are derived from the last exon (except for USP18 in LCR22-3a), which would recognize all of the duplicated copies of each on chromosome 22 if transcribed. The band sizes for BCR are 4.79 kb, 7.50 kb; the expected sizes were 2.6 kb and 4.7 kb. The band sizes for GGT are 1.29 kb and 3.00 kb; the expected sizes were 1.8 kb and 2.5 kb. The band sizes for GGTLA are 1.58 kb and 2.8 kb; the expected size was 2.4 kb. The band size for USP18 is 1.95 kb; the expected size was 1.8 kb. (D) Low-stringency FISH mapping. Probes from LCR22-2 (GenBank AC008132) and LCR22-4 (GenBank AC009288) were used for low-stringency FISH mapping. Hybridization signals were detected in the vicinity of chromosomes 1p13, 2p11, 5p13, 13p11, and 20p12. The strongest signals were detected on chromosome 22q11, due to the presence of multiple copies of sequences contained within the LCR22 clones.

Figure 2

Figure 2

(A) Position of USP18 within the block structure of the LCR22s. Each of the LCR22s are ordered with respect to the centromere of chromosome 22q11.2. The block structure of LCR22 was created using miropeats software to detect sequence relationships. The colors of the blocks were chosen to be coordinated with the genes within. The USP18 gene (red) is shown in its proper transcription orientation. The position of the functional USP18 locus (most centromeric copy in LCR22) and its unprocessed pseudogene copies within the LCR22 block structure are shown. (B) Duplication events for USP18. The exons (numbered red boxes) and high-copy repeating elements are shown (in a “+” orientation above the line and in a “-” orientation below the line. Alu elements are indicated by the subfamily. Mer elements (abbreviated as “M”) were drawn as tracked by RepeatMasker (UCSC browser; June 2002 assembly) in the vicinity of the USP18 functional locus as shown (11 exons; chr22:15573184-15600410). R/C, reverse and complement. The position of the breakpoints in the USP18 functional locus (substrate) and junctions in the duplicated copies (products) are indicated with a vertical line separating the juxtaposed intervals, shown in different colors depending on the LCR22 block to which they map. The Alu elements involved in the recombination events have black fill. The positions of the breakpoints in LCR22-2 that are shown are 15685023-15689864 (LCR22-2), 18244496-18249337 (LCR22-4), and 17418000-17420692 (LCR22-3a). A different breakpoint in the interval between exons 10 and 11 occurred, creating the copy in LCR22-2, shown at positions 15790288-15794241 and LCR22-4 at 18351456-18355409. For more details see Supplemental Figures 1-3.

Figure 3

Figure 3

(A) Position of Alu elements involved in USP18 copies in LCR22-2, -3a, and -4. The USP18 gene (light red shapes; exonic orientation shown) within the block structure of LCR22-2, -3a, and -4 is shown. Alu elements A1 (black), A2 (gray), and A3(white) are illustrated in the “+” or “-” orientation. The duplicated copies of the Alu elements within their respective copies of USP18 are illustrated. (B) LCR22 breakpoints occurred within Alu elements. The position of the _Alu_s (A1, black; A4, light gray; A2, dark gray; A3, white) at the breakpoints (dotted line) in the USP18 functional locus (light red) and copies are shown. (C) _Alu_s involved in breakpoints with respect to the genomic organization of USP18. The genomic organization of USP18 was determined by comparing the cDNA sequence with the human genomic sequence by BLAST analysis. The position of _Alu_s A1, A2, A3, and A4 with respect to genomic organization of USP18 is illustrated (dotted line). (D) Structure of Alu elements A1, A2, A3, and A4, and their duplicated copies. Each Alu monomer (red, pink) is illustrated on either side of the A-rich spacer (blue). The 3′ poly A tail is shown (black). Alu A1 and its copy, A1b in LCR22-3a, contain the first monomer and spacer only, and both are part of the Alu Sq family. Alu A2, Alu Jo, and its copy in LCR22-3a, A2b are shown. The 5′ part of the Alu is not part of the Alu Jo subclass (yellow box). Alu A4 is a member of the Alu Sc subclass. The duplicated copies, A4a1 and A4a2, are chimeric, composed in part with sequences from A4 and part from another Alu (yellow box). Alu A3 and its copies, A3b1, A3b2, A3a1, and A3a2 are illustrated. _Alu_s A3b1 and A3b2 contain only the second monomer and poly A site from Alu A3 and the rest from another Alu element (yellow). More details are provided in the Supplemental Figures 1-3.

Figure 4

Figure 4

(A) Positions of GGT, ISGF3, and predicted gene DKFZp434p211 within the block structure of the LCR22s. Each of the LCR22s is ordered with respect to the centromere of chromosome 22q11.2. The block structure of LCR22 was created using miropeats software to detect sequence relationships. The colors of the blocks were chosen to coordinate with the genes within. The GGT (yellow), IGSF3 (orange), and DKFZp434p211 (overlapping with BCR; blue) genes are shown in their proper transcription orientation. The position of the functional GGT locus (LCR22-8) and its unprocessed pseudogene copies within the LCR22 block structure are shown. (B) Recombination events in GGT and IGSF3. GGT exons 3-17 (chr22:21695000-21722900; yellow numbered boxes) became duplicated to LCR22-2 and LCR22-4. IGSF3 (orange; 1p13.1; chr1:117622772-117711984) became juxtaposed to the copies of GGT. R/C, reverse/complement. Both GGT and IGSF3 harbor the Alu S subfamily member (Alu B1, IGSF3; Alu B2, GGT) at the breakpoint junction (Alu B2a1, Alu B2a2, LCR22-2 and LCR22-4, respectively) between the two functional genes (black-filled elements). The products of the recombination are shown (LCR22-2, chr22:15700573-15723288 and LCR22-4, chr22:18260039-18282533). (C) Recombination events in GGT and predicted gene DKFZp434p211. The two substrates, GGT and DKFZp434p211 (overlapping with BCR), are shown. R/C, reverse/complement. Examination of the pattern of exons and high-copy repetitive elements revealed an Alu Y that was present at the junction between the two substrates in the duplicated products of the recombination event (black fill) in LCR22-5, LCR22-7, and LCR20 (Suppl. Fig. 4). The L2 LINE elements upstream of exon 13 in GGT (yellow fill) and the Alu elements upstream exon 1 of DKFZp434p211 (interval) are indicated (blue line). A putative unequal crossover occurred between Alu C1 and Alu C2 in duplicated copies of GGT and DKFZp434p211, resulting in the fusion product shown. Sequences upstream and including Alu C2 in the fusion product were present in the GGT substrate, and sequences distal to the Alu were present in the DKFZp434p211 substrate.

Figure 5

Figure 5

(A) Position of Alu elements involved in shaping GGT in the LCR22s. The GGT gene (yellow shapes; exonic orientation shown), IGSF3 (orange shapes), and DKFZp434p211 (blue shapes) within the block structure of the LCR22s are shown. Alu elements B1 (light gray), B2 (charcoal gray), C1 (light gray), and C2 (white) are illustrated in the “+” or “-” orientation. The duplicated copies of the Alu elements within their respective copies of GGT are illustrated. (B) _Alu_s involved in breakpoints with respect to the genomic organization of GGT. The genomic organization of GGT was determined by comparing the cDNA sequence with the human genomic sequence by BLAST analysis. The position of _Alu_s B1 and C1 with respect to the genomic organization of GGT is illustrated (dotted line). (C) Structure of Alu elements B2 and C2, including their duplicated copies. Each Alu monomer (red, pink) is illustrated on either side of the A-rich spacer (blue). The 3′ poly A tail is shown (black). Alu B2 and its copies are part of the Alu Sq subfamily (see Suppl. Fig. 4). Alu C2 and its copies are part of the Alu Y subfamily (see Suppl. Fig. 5). (D) Breakpoints within Alu targets. Alu repeats are shown in uppercase and flanking sequences in lowercase. The expected breakpoint positions are marked in red. Potential target site duplications are marked in boldfaced and underlined. (D1) Breakpoint within 3′ Alu target of Alu B2. The 5′ flanks of the Alu B2 substrate and its products B2a1, B2a2 are homologous, but this is not true of the 3′ flanks. The end of homology between Alu B2 and substrates corresponds to the position of the breakpoint. _Alu_s B2a1 and B2a2 contain a variable (CA)n microsatellite within poly A tails, so it is difficult to mark the exact position. However, the presence of two Gs immediately flanking the poly A tail in both the substrate and products suggests the likely position of the breakpoint. During L1 endonuclease-mediated integration, the target sequence is duplicated at the 3′ end, except for the first two nucleotides. The position of the presumed breakpoint (GA) corresponds to start of the (partially preserved) 3′ Alu target-site duplication, i.e., 3′ duplicated target of the Alu insertion. Thus the recombination breakpoint coincides with the L1 endonuclease target site, which can be attacked by the L1 endonuclease. (D2) Breakpoint within 5′ target of Alu C2. The 3′ flanks of the Alu C2 and products C2a1, C2a2, C2a3 are homologous, but this is not true of the 5′ flanks; the end of homology between C2 and substrates, corresponds to the position of the breakpoint. The highlighted TTAA motif (yellow) corresponds to the putative original target; the first DNA nick probably occurred between TT and AA, 13 bp upstream of Alu C2a. The breakpoint within products is located 15 bp downstream of the expected first nick, perfectly fitting with the L1 endonuclease preference for second nick 15-16 bp downstream of the first one (Jurka 1997). Thus the breakpoint could be initiated by L1 endonuclease revisiting the original Alu target, and later repaired by homologous recombination. The second DNA nick probably occurred 2 bp downstream compared to the original insertion, and thus the first two nucleotides were not carried during the recombination event. See more details in Figure 9.

Figure 5

Figure 5

(A) Position of Alu elements involved in shaping GGT in the LCR22s. The GGT gene (yellow shapes; exonic orientation shown), IGSF3 (orange shapes), and DKFZp434p211 (blue shapes) within the block structure of the LCR22s are shown. Alu elements B1 (light gray), B2 (charcoal gray), C1 (light gray), and C2 (white) are illustrated in the “+” or “-” orientation. The duplicated copies of the Alu elements within their respective copies of GGT are illustrated. (B) _Alu_s involved in breakpoints with respect to the genomic organization of GGT. The genomic organization of GGT was determined by comparing the cDNA sequence with the human genomic sequence by BLAST analysis. The position of _Alu_s B1 and C1 with respect to the genomic organization of GGT is illustrated (dotted line). (C) Structure of Alu elements B2 and C2, including their duplicated copies. Each Alu monomer (red, pink) is illustrated on either side of the A-rich spacer (blue). The 3′ poly A tail is shown (black). Alu B2 and its copies are part of the Alu Sq subfamily (see Suppl. Fig. 4). Alu C2 and its copies are part of the Alu Y subfamily (see Suppl. Fig. 5). (D) Breakpoints within Alu targets. Alu repeats are shown in uppercase and flanking sequences in lowercase. The expected breakpoint positions are marked in red. Potential target site duplications are marked in boldfaced and underlined. (D1) Breakpoint within 3′ Alu target of Alu B2. The 5′ flanks of the Alu B2 substrate and its products B2a1, B2a2 are homologous, but this is not true of the 3′ flanks. The end of homology between Alu B2 and substrates corresponds to the position of the breakpoint. _Alu_s B2a1 and B2a2 contain a variable (CA)n microsatellite within poly A tails, so it is difficult to mark the exact position. However, the presence of two Gs immediately flanking the poly A tail in both the substrate and products suggests the likely position of the breakpoint. During L1 endonuclease-mediated integration, the target sequence is duplicated at the 3′ end, except for the first two nucleotides. The position of the presumed breakpoint (GA) corresponds to start of the (partially preserved) 3′ Alu target-site duplication, i.e., 3′ duplicated target of the Alu insertion. Thus the recombination breakpoint coincides with the L1 endonuclease target site, which can be attacked by the L1 endonuclease. (D2) Breakpoint within 5′ target of Alu C2. The 3′ flanks of the Alu C2 and products C2a1, C2a2, C2a3 are homologous, but this is not true of the 5′ flanks; the end of homology between C2 and substrates, corresponds to the position of the breakpoint. The highlighted TTAA motif (yellow) corresponds to the putative original target; the first DNA nick probably occurred between TT and AA, 13 bp upstream of Alu C2a. The breakpoint within products is located 15 bp downstream of the expected first nick, perfectly fitting with the L1 endonuclease preference for second nick 15-16 bp downstream of the first one (Jurka 1997). Thus the breakpoint could be initiated by L1 endonuclease revisiting the original Alu target, and later repaired by homologous recombination. The second DNA nick probably occurred 2 bp downstream compared to the original insertion, and thus the first two nucleotides were not carried during the recombination event. See more details in Figure 9.

Figure 6

Figure 6

(A) Position of GGTLA within the block structure of the LCR22s. Each of the LCR22s is ordered with respect to the centromere of chromosome 22q11.2. The GGTLA (green) gene is shown in its proper transcription orientation. The functional GGTLA locus is in LCR22-7. (B) Recombination events for GGTLA in LCR22-7 to form products in LCR22-5 and LCR20. The different recombination events in the different Alu elements shaping LCR22-5 (19661038-19665200, top; 19692592-19699596) and LCR20 (chr 20:23944266-23949403) are shown. Examination of the GGTLA intron 1 in LCR22-7 revealed duplicated copies on LCR22-8 (unrearranged), LCR22-5, and LCR20. We found rearrangements in LCR22-5 and LCR20. For LCR22-5, a single breakpoint within Alu D5 was responsible for creating the two reciprocal copies, one in the proximal end of LCR22-5 and one at the distal end of LCR22-5. Both form the borders of the LCR22. A different breakpoint, in Alu D4, was responsible for shaping the border of LCR20.

Figure 7

Figure 7

(A) Position of Alu elements involved in GGTLA rearrangements. The GGTLA gene (green shapes; exonic orientation shown) within the block structure of the LCR22s is shown. Alu elements D4 (gray) and D5 (white) are illustrated in the “+” or “-” orientation, associated with their GGTLA and gene copies. (B) _Alu_s involved in breakpoints with respect to the genomic organization of GGTLA. The genomic organization of GGTLA was determined by comparing the cDNA sequence with the human genomic sequence by BLAST analysis. The position of _Alu_s D4 and D5 with respect to the genomic organization of GGTLA is illustrated (dotted line). (C) Structure of Alu element D5 including its duplicated copies. A breakpoint in Alu D5 resulted in two reciprocal copies, D5c and D5e. For more details, see Supplemental Figure 7.

Figure 8

Figure 8

A model of insertion of IGSF3 into LCR22-2 and LCR22-4; interchromosomal recombination between IGSF3 on chromosome 1 and GGT on chromosome 22 (see Fig. 6 in Richardson et al. 1998), explaining the mechanism by which recombination occurs on nonhomologous chromosomes, thereby avoiding crossovers which would lead to aberrant translocations. In this model, a breakpoint in chromosome 22 occurred, presumably at one end of misaligned Alu elements (black boxes). The broken ends from chromosome 22 then would invade the homologous sequence, the Alu (black box) on chromosome 1, forming a D-loop. The invading end would prime DNA synthesis, extending, in this case, a significant distance on chromosome 1. The process would involve the migration of the D-loop into nonhomologous sequences downstream of the region of homology (the Alu). At a further distance, the newly synthesized strand would rejoin chromosome 22 in a region of homology (or nonhomology) between chromosomes 1 and 22. Thus, this model combines homologous recombination in the absence of a crossover with nonhomologous repair. It was proposed for mitotic rearrangements (Richardson et al. 1998), but could be envisioned for meiotic rearrangements as well.

Figure 9

Figure 9

Model of Alu integration and generation of breakpoint near integrated Alu. This is based upon the B2, C2, and D4 rearrangements. (A) Enzymatic nicking in the presence of RNA indicated by vertical black arrow. (B) Synthesis of cDNA, indicated by dotted line, and formation of a second nick, indicated by black arrow on the opposite strand. (C) Completion of reverse transcription and DNA-dependent DNA synthesis, indicated by a dashed line and the lowercase letters, followed by ligation. (D) Elimination of RNA and synthesis of the second DNA strand. The integrated Alu element is surrounded by the target-site duplications (TSDs), usually 15-16-bp long. TSDs are marked in boldface and underlined. (E) Potential sites for secondary attacks by the L1 endonuclease, in 5′ and 3′ duplicated targets, are indicated by the black arrows. The 5′ flanking sequence contains an intact target TTAAAAN.NYTN; the 3′ duplicated target lacks the first two nucleotides (typically TT). Modified from Jurka (1997).

References

    1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol 215: 403-410. - PubMed
    1. Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., and Eichler, E.E. 2002a. Recent segmental duplications in the human genome. Science 297: 1003-1007. - PubMed
    1. Bailey, J.A., Yavor, A.M., Viggiano, L., Misceo, D., Horvath, J.E., Archidiacono, N., Schwartz, S., Rocchi, M., and Eichler, E.E. 2002b. Human-specific duplication and mosaic transcripts: The recent paralogous structure of chromosome 22. Am. J. Hum. Genet 70: 83-100. - PMC - PubMed
    1. Baker, M.D., Read, L.R., Beatty, B.G., and Ng, P. 1996. Requirements for ectopic homologous recombination in mammalian somatic cells. Mol. Cell. Biol. 16: 7122-7132. - PMC - PubMed
    1. Batzer, M.A. and Deininger, P.L. 2002. Alu repeats and human genomic diversity. Nat. Rev. Genet. 3: 370-379. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources