Generation and comparative analysis of approximately 3.3 Mb of mouse genomic sequence orthologous to the region of human chromosome 7q11.23 implicated in Williams syndrome - PubMed (original) (raw)

Comparative Study

doi: 10.1101/gr.214802.

Laura Elnitski, Jacquelyn R Idol, Johannah L Doyle, Weiniu Gan, James W Thomas, Scott Schwartz, Nicole L Dietrich, Stephen M Beckstrom-Sternberg, Jennifer C McDowell, Robert W Blakesley, Gerard G Bouffard, Pamela J Thomas, Jeffrey W Touchman, Webb Miller, Eric D Green

Affiliations

Comparative Study

Generation and comparative analysis of approximately 3.3 Mb of mouse genomic sequence orthologous to the region of human chromosome 7q11.23 implicated in Williams syndrome

Udaya DeSilva et al. Genome Res. 2002 Jan.

Abstract

Williams syndrome is a complex developmental disorder that results from the heterozygous deletion of a approximately 1.6-Mb segment of human chromosome 7q11.23. These deletions are mediated by large (approximately 300 kb) duplicated blocks of DNA of near-identical sequence. Previously, we showed that the orthologous region of the mouse genome is devoid of such duplicated segments. Here, we extend our studies to include the generation of approximately 3.3 Mb of genomic sequence from the mouse Williams syndrome region, of which just over 1.4 Mb is finished to high accuracy. Comparative analyses of the mouse and human sequences within and immediately flanking the interval commonly deleted in Williams syndrome have facilitated the identification of nine previously unreported genes, provided detailed sequence-based information regarding 30 genes residing in the region, and revealed a number of potentially interesting conserved noncoding sequences. Finally, to facilitate comparative sequence analysis, we implemented several enhancements to the program, including the addition of links from annotated features within a generated percent-identity plot to specific records in public databases. Taken together, the results reported here provide an important comparative sequence resource that should catalyze additional studies of Williams syndrome, including those that aim to characterize genes within the commonly deleted interval and to develop mouse models of the disorder.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Long-range organization of human and mouse Williams syndrome (WS) regions. A physical map of the WS regions on human chromosome 7q and mouse chromosome 5G is depicted emphasizing the positions of the known genes residing within and flanking the interval commonly deleted in WS (DeSilva et al. 1999; Francke 1999; Hockenhull et al. 1999; Osborne 1999; Korenberg et al. 2000; Peoples et al. 2000; Valero et al. 2000). In the human WS region, this interval spans ∼1.6 Mb (indicated by a bold dashed line) and is flanked by duplicated blocks of DNA of near-identical sequence (estimated at ∼300 kb in size; indicated by dark rectangles). The relative positions of the centromere (CEN) and telomere (TEL) are indicated in each case. Note the inverted orientation of the two discontiguous segments of human chromosome 7 relative to the single contiguous segment of mouse chromosome 5G. The relative positions of the known human and mouse genes residing in this region are indicated, with additional details provided in Table 1. Depicted below the map of the mouse WS region are the 21 overlapping BAC/PAC clones selected for sequencing (see

http://bio.cse.psu.edu/publications/desilva

for a complete contig map of the mouse WS region), with the current sequencing status (finished, full shotgun, or working draft) indicated at the bottom (also see Table 2). Note that the depicted genomic regions and the BAC/PAC clones are not drawn to scale.

Figure 2

Figure 2

Representative portion of the percent-identity plot (PIP) comparing mouse and human sequence from the Williams syndrome (WS) region. The finished mouse sequence reported here was compared with the available orthologous human sequence using

PipMaker

. The complete PIP and details about the various annotations it contains are available at

http://bio.cse.psu.edu/publications/desilva

. Shown here is a ∼60-kb region containing portions of the Gtf2i/GTF2I and Gtf2ird1/GTF2IRD1 genes and the interval residing between them. Note that only gap-free segments that are ≥50% identical between mouse and human are plotted. The first two exons and last nine exons of Gtf2i/GTF2I and Gtf2ird1/GTF2IRD1, respectively, are represented by vertical rectangles and numbered accordingly; most of these exons are associated with high levels of mouse–human sequence conservation. Note the two conserved noncoding sequences at ∼205 kb and ∼239 kb (both are gap-free segments of >100 bp in length with mouse–human sequence identities of >70% and >90%, respectively, as indicated by the different colored vertical lines at those positions). Also note the various colored horizontal bars drawn above the two genes; in the actual PDF file generated by

PipMaker

, these bars provide direct links to relevant Internet sites (e.g., appropriate PubMed citation[s] for the gene [pink], the GenBank record containing the predicted amino acid sequence of the protein encoded by the gene [light blue], and the LocusLink entry for the gene [dark blue]). The bookmarks along the left side provide links to compiled information about the various genes and other annotations generated during the comparative analysis of these sequences.

Figure 3

Figure 3

Identification of previously unreported genes in the Williams syndrome (WS) region. Of the 30 genes identified within the ∼1.4 Mb of finished mouse sequence (see Table 4), 9 have not been previously reported to reside within the WS region. Information about each of these 9 genes is provided (listed in order across the mouse WS region), including (1) a representative GenBank accession number for the mouse cDNA sequence (note in one case, BF522554, the only available cDNA sequence was from rat); (2) the type of sequence contained in that GenBank record (Riken full-length [FL] cDNA sequence [Kawai et al. 2001] or EST); (3) the percent-identity between the mouse genomic sequence and the matching cDNA sequence; (4) an indication of whether or not the putative gene overlaps a

GenScan

-predicted gene (specifically, if >1 exon matches a

Genscan

-predicted exon or, in the case of AK019256, the single exon matches the predicted exon for >500 bp; note that the only gene not meeting these criteria, AK017044, did have one of its exons matching a

Genscan

-predicted exon); and (5) the gene-containing portion of the percent-identity plot (PIP) showing the pattern of mouse–human sequence conservation (except for AK005040 and AK017044, for which no human sequence was available). See Fig. 2 for additional details about the PIP.

References

    1. Ansari-Lari MA, Oeltjen JC, Schwartz S, Zhang Z, Muzny DM, Lu J, Gorrell JH, Chinault AC, Belmont JW, Miller W, et al. Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. Genome Res. 1998;8:29–40. - PubMed
    1. Antequera F, Bird A. Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci. 1993;90:11995–11999. - PMC - PubMed
    1. Battey J, Jordan E, Cox D, Dove W. An action plan for mouse genomics. Nat Genet. 1999;21:73–75. - PubMed
    1. Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES. Human and mouse gene structure: Comparative analysis and application to exon prediction. Genome Res. 2000;10:950–958. - PMC - PubMed
    1. Baumer A, Dutly F, Balmer D, Riegel M, Tukel T, Krajewska-Walasek M, Schinzel AA. High level of unequal meiotic crossovers at the origin of the 22q11.2 and 7q11.23 deletions. Hum Mol Genet. 1998;7:887–894. - PubMed

Publication types

MeSH terms

LinkOut - more resources