Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional - PubMed (original) (raw)

Comparative Study

Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional

Kelly A Frazer et al. Genome Res. 2004 Mar.

Abstract

Cross-species DNA sequence comparison is a fundamental method for identifying biologically important elements, because functional sequences are evolutionarily conserved, wheres nonfunctional sequences drift. A recent genome-wide comparison of human and mouse DNA discovered over 200,000 conserved noncoding sequences with unknown function. Multispecies DNA comparison has been proposed as a method to prioritize these conserved noncoding sequences for functional analysis based on the hypothesis that elements present in many species are more likely to be functional than elements present in limited numbers of species. Here, we perform a comparative analysis of the single-minded 2 (SIM2) gene interval on human chromosome 21 with horse, cow, pig, dog, cat, and mouse DNA. We classify conserved sequences based on the number of mammals in which they are present, and experimentally test sequences in each class for function. As hypothesized, conserved sequences present in many mammals are frequently functional. Additionally, we demonstrate that sequences conserved in a limited number of mammals are also frequently functional. Examination of genomic deletions in chimpanzee and rhesus macaque DNA showed that several putatively functional conserved noncoding human sequences were absent in these primates. These findings suggest that functional conserved noncoding human sequences can be missing in other mammals, even closely related primate species.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Conserved human sequences in the 10-kb interval surrounding the first exon of SIM2 (yellow rectangle). The conserved elements are shown relative to their position in the human reference sequence (horizontal axis), and their percent conformances (vertical axis). (Position numbers) The position on human 21q (NCBI contig NT_002836). (A) Rectangles rising above the veil indicate conserved human–mouse elements identified by two methods; (blue) sequence alignment (≥100 bp and 80% identify); (gray) 21q array data (≥30 bp and 60% conformance). Red rectangles at the bottom indicate the positions of interspersed repeats, which were not tiled on the arrays, and therefore, conformance information is absent. (B) Rectangles indicate conserved human elements (≥30-nt length and ≥60% conformance) identified by comparisons of horse, cow, pig, dog, cat, and mouse DNA by hybridization to human 21q arrays.

Figure 2

Figure 2

Evolutionarily conserved base pairs in a 365-kb region on human chromosome 21 containing the SIM2 locus. Six species are compared with human genomic DNA for evolutionary conservation. The multispecies data set represents the composite of conserved sequences identified in all six species. The bars represent the percent of the 365,000 bp that are in elements conserved (≥30-nt length and ≥60% conformance) between humans and different numbers of the six species. Base pairs conserved only between the indicated species and humans (unique, dark blue); base pairs conserved between humans and two to five species (limited, red); base pairs conserved in all mammals analyzed (common, yellow). Pale blue bars represent the total percent of base pairs conserved. For details see Supplemental information.

Figure 3

Figure 3

Noncoding sequences upstream of SIM2 chosen for functional characterization. (A) Conserved sequences (c1–c10) and nonconserved sequences (n1–n5) are located within four different intervals (coordinates based on NCBI contig NT_002836) (n6 is located ∼94 kb upstream). Visualization plots (see Fig. 1) show the species in which elements c1–c10 are conserved. (B) Conserved sequences deleted in the chimpanzee and rhesus macaque genomes. Comparison of syntenic human (H), gorilla (G), chimpanzee (C), and rhesus macaque (R) long-range PCR (LR–PCR) products by gel electrophoresis shows the deletion of interval 1 conserved sequences in chimpanzee genomic DNA and the deletion of interval 3 conserved sequences in rhesus macaque genomic DNA (yellow arrows). The visualization plots, generated by hybridization of the primate LR–PCR products to the human 21q arrays, indicate the positions of the human sequences (highlighted in yellow in A and B) deleted in the chimpanzee and rhesus macaque genomes. The deletions correspond to the drop in percent conformance, plotted on the vertical scale relative to the position in the human reference sequence. Green horizontal lines at the top of the plots indicate the positions of the LR–PCR products.

Figure 4

Figure 4

Functional characterization of noncoding sequences. (A) Expression values of 16 independent luciferase reporter constructs as assayed by transfection analysis; (filled bars) conserved noncoding sequences; (open bars) nonconserved noncoding sequences inserted in front of the SIM2 promoter (see Fig. 3A). Luciferase expression values, normalized against β-galactosidase expression values, are averages of 12 individual experiments (each construct was analyzed in triplicate on four different days) and are expressed as the percent increase in activity over the control (the luciferase reporter construct containing only the SIM2 promoter). The means across conserved and nonconserved sequences (dotted lines) were significantly different (P ∼ 0.0047, two sample _t_-test). Error bars, one SD. (B) Electrophoretic mobility-shift patterns of noncoding DNA fragments. (Lane 1, highlighted) DNA fragment alone; (lane 2) DNA fragment incubated with nuclear extract. (Red arrows) Band-shift indicating DNA–protein binding. Conserved elements c4 and c7 were too long for gel-shift analysis.

Figure 4

Figure 4

Functional characterization of noncoding sequences. (A) Expression values of 16 independent luciferase reporter constructs as assayed by transfection analysis; (filled bars) conserved noncoding sequences; (open bars) nonconserved noncoding sequences inserted in front of the SIM2 promoter (see Fig. 3A). Luciferase expression values, normalized against β-galactosidase expression values, are averages of 12 individual experiments (each construct was analyzed in triplicate on four different days) and are expressed as the percent increase in activity over the control (the luciferase reporter construct containing only the SIM2 promoter). The means across conserved and nonconserved sequences (dotted lines) were significantly different (P ∼ 0.0047, two sample _t_-test). Error bars, one SD. (B) Electrophoretic mobility-shift patterns of noncoding DNA fragments. (Lane 1, highlighted) DNA fragment alone; (lane 2) DNA fragment incubated with nuclear extract. (Red arrows) Band-shift indicating DNA–protein binding. Conserved elements c4 and c7 were too long for gel-shift analysis.

Similar articles

Cited by

References

    1. Boffelli, D., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391–1394. - PubMed
    1. Chureau, C., Prissette, M., Bourdet, A., Barbe, V., Cattolico, L., Jones, L., Eggen, A., Avner, P., and Duret, L. 2002. Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine. Genome Res. 12: 894–908. - PMC - PubMed
    1. Dermitzakis, E.T., Reymond, A., Lyle, R., Scamuffa, N., Ucla, C., Deutsch, S., Stevenson, B.J., Flegel, V., Bucher, P., Jongeneel, C.V., et al. 2002. Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature 420: 578–582. - PubMed
    1. Ema, M., Ikegami, S., Hosoya, T., Mimura, J., Ohtani, H., Nakao, K., Inokuchi, K., Katsuki, M., and Fujii-Kuriyama, Y. 1999. Mild impairment of learning and memory in mice overexpressing the mSim2 gene located on chromosome 16: An animal model of Down's syndrome. Hum. Mol. Genet. 8: 1409–1415. - PubMed
    1. Fahrenkrug, S.C., Rohrer, G.A., Freking, B.A., Smith, T.P., Osoegawa, K., Shu, C.L., Catanese, J.J., and de Jong, P.J. 2001. A porcine BAC library with tenfold genome coverage: A resource for physical and genetic map integration. Mamm. Genome 12: 472–474. - PubMed

WEB SITE REFERENCES

    1. http://bio.cse.psu.edu/genome/hummus/; Whole Genome Human/Mouse Homology Web site.
    1. http://bacpac.chori.org/; BACPAC Resources Center Home Page (Children's Hospital Oakland Research Institute).

Publication types

MeSH terms

Substances

LinkOut - more resources