Characterization of the past and current duplication activities in the human 22q11.2 region - PubMed (original) (raw)

Characterization of the past and current duplication activities in the human 22q11.2 region

Xingyi Guo et al. BMC Genomics. 2011.

Abstract

Background: Segmental duplications (SDs) on 22q11.2 (LCR22), serve as substrates for meiotic non-allelic homologous recombination (NAHR) events resulting in several clinically significant genomic disorders.

Results: To understand the duplication activity leading to the complicated SD structure of this region, we have applied the A-Bruijn graph algorithm to decompose the 22q11.2 SDs to 523 fundamental duplication sequences, termed subunits. Cross-species syntenic analysis of primate genomes demonstrates that many of these LCR22 subunits emerged very recently, especially those implicated in human genomic disorders. Some subunits have expanded more actively than others, and young Alu SINEs, are associated much more frequently with duplicated sequences that have undergone active expansion, confirming their role in mediating recombination events. Many copy number variations (CNVs) exist on 22q11.2, some flanked by SDs. Interestingly, two chromosome breakpoints for 13 CNVs (mean length 65 kb) are located in paralogous subunits, providing direct evidence that SD subunits could contribute to CNV formation. Sequence analysis of PACs or BACs identified extra CNVs, specifically, 10 insertions and 18 deletions within 22q11.2; four were more than 10 kb in size and most contained young AluYs at their breakpoints.

Conclusions: Our study indicates that AluYs are implicated in the past and current duplication events, and moreover suggests that DNA rearrangements in 22q11.2 genomic disorders perhaps do not occur randomly but involve both actively expanded duplication subunits and Alu elements.

PubMed Disclaimer

Figures

Figure 1

Figure 1

A schematic cartoon for the decomposition of segmental duplications into duplication subunits and the construction of map for putative duplication events. (A) Five hypothetical duplication loci (a-e) are depicted with their duplication history shown below. Note that in real cases the historical duplication directions can only be inferred as duplications occurred in the past and are actually invisible. (B) The segmental duplication data for these five loci are represented by seven pairs of duplicons (boxes connected by dash lines). A total of 202 such pairs exist for 22q11.2 based on sequence comparison. (C) Fifteen duplication subunits (forming six paralogous families) decomposed from the pair-wise alignment information in B. (D) The five duplication loci are grouped and all loci are then aligned to the "a" locus, which is the largest one. Note that the entire locus "a" has to be derived from the merge of left duplicons in SD1 and SD2. 33 such duplication groups were defined for 22q11.2, containing 174 duplication loci (see Figure 2B).

Figure 2

Figure 2

The mosaic architecture of segmental duplications in the human 22q11.2 region. (A) Duplicated subunits, genes and pseudogenes. The 22q11.2 region is depicted as a grey line and colored boxes for unique and SD sequences, respectively. Eight duplicated blocks are labelled with red arrow lines for current boundary definition (Table 1) and blue arrow lines for the previous definition [14]. Paralogous subunits (i.e., in the same subunit family) are shown with same color. For simplification, both genes (green) and pseudogenes (purple) were drawn without names. (B) Hierarchy of non-overlapping duplicated loci. A total of 33 groups of duplication loci in 22q11.2 were identified and all loci were aligned to the largest locus of their corresponding groups (all subunits have the same color as in Figure 2A). Horizontal order shows relative chromosome locations with white spaces added to separate sequences in distinct duplication groups. Arrows point to paralogous subunits at the breakpoints of recurrent (> 5) duplications; numbers below them are the total subunits at breakpoints and subsets with Alu elements. A gap in LCR22-3a' was represented by a dash line with 'N'.

Figure 3

Figure 3

Synteny of SDs on 22q11.2. (A) The syntenic relationship of the subunits with chimpanzee, orangutan and macaque is shown as present (matching color boxes) or absent (white). This map was derived from our analysis of the multi-genome alignment data in the Ensembl database (see Methods). The boxed region in LCR22-5' was subsequently confirmed by PCR to be absent in the macaque genome (see Additional file 3, Figure S2). (B) Comparison of primate segmental duplications. The data were retrieved from a previous study using WSSD analysis for SD detection [29]. The depth of sequence read coverage (number of shot-gun sequencing reads in 5-kb windows) is depicted for human (HAS), chimpanzee (PTR), orangutan (PPY) and macaque (MMU) based on alignment of reads against the human genome. Putative duplicated regions with excess read depth (more than three standard deviation of the mean) are shown in red with unique regions in green. Human and chimp SDs derived from depth analysis are also shown below the human SDs derived from WGAC analysis (top). The data here suggest that most of the sequences in LCR22-2', -3a' and -4' are shared between human and chimpanzee and their duplications likely occurred after the split of the African great apes from Asian great apes. Interestingly, the human-specific SDs in LCR22-3a' and -4' show higher sequence identity (represented by light to dark orange color) than the rest of the SDs (light to dark grey). (C) Past duplication events that may have generated the homology between LCR22-3a' and LCR22-4'. Arrow lines represent putative duplication directions. The large cyan subunit in LCR22-3a' may have arisen from either the proximal or distal paralogous sequences in LCR22-4'.

Figure 4

Figure 4

Subunit family spreading in multiple LCR22' blocks is often adjacent to Alu repeats. (A) The SD subunits were assigned to different layers of circles, whereas the numbers represent the total blocks in which a subunit family has one or more members. For example, a subunit family is given 3 if its members are found in 3 of the 8 blocks, and consequently all subunits of this family will be drawn in the circle labeled with "3". (B) Relationship between selected sequence features and block occupancy for SD subunits. The _x_-axis describes the number of blocks a subunit family occupies (A). The _y_-axis shows the percentage of subunit endpoints with a given sequence feature. No subunit family was found in and only in five blocks.

Figure 5

Figure 5

Distribution of previously annotated CNVs in the 22q11.2 region. The gain and loss CNVs collected from previous publications are shown with blue and red, respectively. The bottom row illustrates SD subunits. The figure was prepared using the UCSC browser.

Figure 6

Figure 6

Distribution of BAC and other genomic clones and CNVs derived from them. A total of 191 clones were mapped to the 22q11.2 region in the human reference genome, resulting 28 CNVs (blue for gain and red for loss). Only clones with CNVs are shown here to simplify the figure. Coordinates of these CNVs are available in Table 3.

Figure 7

Figure 7

Many CNVs are flanked by paralogous subunits and/or Alu SINES. (A) A total of 13 previously detected CNVs have their endpoints located to paralogous subunits. All subunits are colored as Figure 2, in addition, with blue color for gain CNVs and red for loss CNVs. One CNV marked with a "*" is found by our clone mapping (Figure 6). Sequence features around (± 1 kb) the insertion sites of ten gain CNVs (B) or the two breakpoints of 18 loss CNVs (C) from current clone mapping analysis. In (B) and (C) arrows point to the breakpoints and coordinates and other detailed information is in Table 3.

Similar articles

Cited by

References

    1. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE. Recent segmental duplications in the human genome. Science. 2002;297(5583):1003–1007. doi: 10.1126/science.1072047. - DOI - PubMed
    1. She X, Jiang Z, Clark RA, Liu G, Cheng Z, Tuzun E, Church DM, Sutton G, Halpern AL, Eichler EE. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature. 2004;431(7011):927–930. doi: 10.1038/nature03062. - DOI - PubMed
    1. Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001;11(6):1005–1017. doi: 10.1101/gr.GR-1871R. - DOI - PMC - PubMed
    1. Halford S, Wadey R, Roberts C, Daw SC, Whiting JA, O'Donnell H, Dunham I, Bentley D, Lindsay E, Baldini A. et al.Isolation of a putative transcriptional regulator from the region of 22q11 deleted in DiGeorge syndrome, Shprintzen syndrome and familial congenital heart disease. Hum Mol Genet. 1993;2(12):2099–2107. doi: 10.1093/hmg/2.12.2099. - DOI - PubMed
    1. Edelmann L, Pandita RK, Morrow BE. Low-copy repeats mediate the common 3-Mb deletion in patients with velo-cardio-facial syndrome. Am J Hum Genet. 1999;64(4):1076–1086. doi: 10.1086/302343. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources