Comparative DNA sequence analysis of mouse and human protocadherin gene clusters - PubMed (original) (raw)

Comparative Study

. 2001 Mar;11(3):389-404.

doi: 10.1101/gr.167301.

Affiliations

Comparative Study

Comparative DNA sequence analysis of mouse and human protocadherin gene clusters

Q Wu et al. Genome Res. 2001 Mar.

Abstract

The genomic organization of the human protocadherin alpha, beta, and gamma gene clusters (designated Pcdh alpha [gene symbol PCDHA], Pcdh beta [PCDHB], and Pcdh gamma [PCDHG]) is remarkably similar to that of immunoglobulin and T-cell receptor genes. The extracellular and transmembrane domains of each protocadherin protein are encoded by an unusually large "variable" region exon, while the intracellular domains are encoded by three small "constant" region exons located downstream from a tandem array of variable region exons. Here we report the results of a comparative DNA sequence analysis of the orthologous human (750 kb) and mouse (900 kb) protocadherin gene clusters. The organization of Pcdh alpha and Pcdh gamma gene clusters in the two species is virtually identical, whereas the mouse Pcdh beta gene cluster is larger and contains more genes than the human Pcdh beta gene cluster. We identified conserved DNA sequences upstream of the variable region exons, and found that these sequences are more conserved between orthologs than between paralogs. Within this region, there is a highly conserved DNA sequence motif located at about the same position upstream of the translation start codon of each variable region exon. In addition, the variable region of each gene cluster contains a rich array of CpG islands, whose location corresponds to the position of each variable region exon. These observations are consistent with the proposal that the expression of each variable region exon is regulated by a distinct promoter, which is highly conserved between orthologous variable region exons in mouse and human.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Comparison of the organization of mouse and human protocadherin gene clusters. Shown are the genomic organization of three closely linked mouse protocadherin gene clusters (A) and comparisons of the genomic organization of mouse and human _Pcdh_α/CNR (B), _Pcdh_β (C), and _Pcdh_γ (D) gene clusters. The BAC clones used in the sequence analysis are shown below (A). The length of sequences between clusters is also shown in (A). Each gene family contains multiple tandem variable region exons indicated by a vertical color bar: (mauve) Pcdhα variable region exons; (turquoise) Pcdhβ genes; (orange) _Pcdh_γ-b variable region exons; (green) _Pcdh_γ-a variable region exons; (yellow) C-type Pcdh variable region exons (present in both the _Pcdh_α and _Pcdh_γ gene clusters); (blue) relic or pseudogene variable region sequences (present in all three gene clusters); (pink) constant region exons. Abbreviations: Pcdh, protocadherin; V, variable region; C, constant region; M, mouse; H, human; r, relic; Θ, pseudogene.

Figure 2

Figure 2

Alignments of variable region 5′ splice sites of mouse _Pcdh_α (A) and _Pcdh_γ (B) gene clusters. The 5′ splice site sequences are shown in bold, with the consensus below each panel.

Figure 3

Figure 3

Phylogenetic trees of human and mouse _Pcdh_α (A), _Pcdh_β (B), and _Pcdh_γ (C) gene clusters. The trees were reconstructed using the neighbor-joining method of the

PAUP

program. The tree branches are labeled with the percentage support for that partition based on 1000 bootstrap replicates. Only bootstrap values of >50% are shown. The unrooted trees are rooted by midpoint prior to output.

Figure 4

Figure 4

Distribution of CpG islands in the genomic sequences of human and mouse protocadherin gene clusters. Shown are ratios of observed to expected CpG dinucleotide frequency of a 1000 bp sliding window in the region of human _Pcdh_α (A), _Pcdh_β (B), and _Pcdh_γ (C) and mouse _Pcdh_α (D), _Pcdh_β (E), and _Pcdh_γ (F) gene clusters. The peak of ratios correlates with the position of protocadherin variable region exons but not constant region exons. The position of each variable and constant region exon is indicated at the top of each panel. (CT), constant region exon.

Figure 5

Figure 5

Percent identity plot (PIP) of the _Pcdh_α (A) and _Pcdh_γ (B) genomic sequences between mouse and human by using the

PipMaker

program with the chaining option. The mouse genomic sequences are shown on the _x_-axis, and the percentage sequence identities (50%–100%) are shown on the _y_-axis. Annotation of the mouse sequences is illustrated at the top of the sequences by solid color boxes. The repeats of mouse sequence are depicted as follows: (black pointed boxes) LINE2s; (light gray pointed boxes) LINE1s; (dark gray pointed boxes) LTRs; (black triangles) MIRs; (light gray triangles) SINEs other than MIRs; (dark gray triangles) other repeats; (white boxes) simple repeats. Short yellow boxes are CpG islands where the ratio of CpG/GpC exceeds 0.75, and short green boxes are CpG islands where the ratio of CpG/GpC is between 0.60 and 0.75. (MDIA1) the last exon of mouse diaphanous gene 1.

Figure 5

Figure 5

Percent identity plot (PIP) of the _Pcdh_α (A) and _Pcdh_γ (B) genomic sequences between mouse and human by using the

PipMaker

program with the chaining option. The mouse genomic sequences are shown on the _x_-axis, and the percentage sequence identities (50%–100%) are shown on the _y_-axis. Annotation of the mouse sequences is illustrated at the top of the sequences by solid color boxes. The repeats of mouse sequence are depicted as follows: (black pointed boxes) LINE2s; (light gray pointed boxes) LINE1s; (dark gray pointed boxes) LTRs; (black triangles) MIRs; (light gray triangles) SINEs other than MIRs; (dark gray triangles) other repeats; (white boxes) simple repeats. Short yellow boxes are CpG islands where the ratio of CpG/GpC exceeds 0.75, and short green boxes are CpG islands where the ratio of CpG/GpC is between 0.60 and 0.75. (MDIA1) the last exon of mouse diaphanous gene 1.

Figure 6

Figure 6

Upstream sequences of orthologous genes are more conserved than paralogous genes. The maximal sequence identities of all 100-bp segments within a 150-bp sliding window were computed for each gene pair. The _x_-axis represents the end position of the sliding window relative to the translation-start codon. The _y_-axis represents the percentage sequence identities. Shown are the average of 100-bp-segment maximal identities of all orthologous (solid lines with standard deviation) gene pairs in _Pcdh_α (A) and _Pcdh_γ (B) gene clusters. Also shown are the maximal identities between each gene and all the other paralogous members (excluding C-type protocadherin genes) of the same gene cluster (broken lines without standard deviation). The maximal identities for each orthologous gene pair in C-type protocadherin genes are shown individually in C. Note that the conserved region upstream of C-type protocadherin genes is larger than that of other protocadherin genes.

Figure 7

Figure 7

Conserved sequences upstream from constant region exon 1 of _Pcdh_α (A) and _Pcdh_γ (B) gene clusters. The identical nucleotides are shown by short vertical lines. The relative positions to the start nucleotide of constant region exon 1 are shown at the beginning and end of each sequence.

Figure 8

Figure 8

Alignment of conserved sequence motif upstream of protocadherin coding region. Shown are the conserved sequences and their relative positions to the translation start codon in mouse _Pcdh_α (A), _Pcdh_β (B), and _Pcdh_γ (C) and human _Pcdh_α (D), _Pcdh_β (E), and _Pcdh_γ (F) gene clusters. The probability of finding the motif within -290 to -150 nucleotides upstream of the translation start codon is shown within parentheses at right. The consensus sequences are shown below each panel. The conserved nucleotides are shown with white letters on a black background. The core sequences are highlighted with yellow bold letters on a red background.

Similar articles

Cited by

References

    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Ansari-Lari MA, Oeltjen JC, Schwartz S, Zhang Z, Muzny DM, Lu J, Gorrell JH, Chinault AC, Belmont JW, Miller W, et al. Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. Genome Res. 1998;8:29–40. - PubMed
    1. Antequera F, Bird A. Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci. 1993;90:11995–11999. - PMC - PubMed
    1. Bruses JL. Cadherin-mediated adhesion at the interneuronal synapse. Curr Opin Cell Biol. 2000;12:593–597. - PubMed
    1. Camacho JA, Obie C, Biery B, Goodman BK, Hu CA, Almashanu S, Steel G, Casey R, Lambert M, Mitchell GA, et al. Hyperornithinaemia-hyperammonaemia-homocitrullinuria syndrome is caused by mutations in a gene encoding a mitochondrial ornithine transporter. Nat Genet. 1999;22:151–158. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources