A large family of ancient repeat elements in the human genome is under strong selection - PubMed (original) (raw)

A large family of ancient repeat elements in the human genome is under strong selection

Michael Kamal et al. Proc Natl Acad Sci U S A. 2006.

Abstract

Although conserved noncoding elements (CNEs) constitute the majority of sequences under purifying selection in the human genome, they remain poorly understood. CNEs seem to be largely unique, with no large families of similar elements reported to date. Here, we search for CNEs among the ancestral repeat classes in the human genome and report the discovery of a large CNE family containing >900 members. This family belongs to the MER121 class of repeats. Although the MER121 family members show considerable sequence variation among one another, the individual copies show striking conservation in orthologous locations across the human, dog, mouse, and rat genomes. The element is also present and conserved in orthologous locations in the marsupial, but its genome-wide dispersal postdates the divergence from birds. The comparative genomic data indicate that MER121 does not encode a family of either protein-coding or RNA genes. Although the precise function of these elements remains unknown, the evidence suggests that this unusual family may play a cis-regulatory or structural role in mammalian genomes.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: No conflicts declared.

Figures

Fig. 1.

Fig. 1.

Conservation and orthologous bases of the top 200 human MER121 instances most similar to the consensus. Human instances are aligned to one another, starting from the copy most similar to consensus and following a progressive alignment strategy. This approach results in a multiple alignment with 200 rows, one for each human instance. To incorporate information about orthologous sequence in the other species, we replace each human base with its corresponding four-way HDMR multiple alignment column, thereby enlarging the alignment to 800 rows (= 200 × 4). (A) Positions are colored according to four-way conservation (red, bases with perfect four-way conservation; black, bases with four-way alignment but not perfect conservation; gray, human base lacks an orthologous base in at least one other species. (B) Positions are colored according to their DNA base (blue, A; green, G; orange, C; red, T).

Fig. 2.

Fig. 2.

Conservation profile along the MER121 consensus. (A) Percentage of aligned human MER121 elements with an aligned orthologous base in the other species at the indicated position. (B) Percentage of these aligned bases showing perfect conservation across the species. Red curves indicate four-way comparison of HDMR; blue curves indicate two-way comparison of human and marsupial.

Fig. 3.

Fig. 3.

Frequency of insertions and deletions of various lengths within aligned regions for MER121 (A), MER119 (B), and protein-coding regions (C). Graphs show gap events in human–dog sequence comparison within four-way alignments of HDMR. Gap events are human bases that align to a “-” character in dog.

Fig. 4.

Fig. 4.

Conservation rate of 6-mers along the MER121 consensus. Histogram shows the probability that the 6-mer at the indicated position along the consensus shows perfect four-way conservation across HDMR. The 6-mer profile is notably more peaked than the single base conservation rate shown in Fig. 2_B_.

Fig. 5.

Fig. 5.

Size distribution of MER121 clusters. Human copies of MER121 found within a region of size 1.65 Mb, half the expected distance if copies were distributed uniformly across the genome.

References

    1. Waterston R. H., Lindblad-Toh K., Birney E., Rogers J., Abril J. F., Agarwal P., Agarwala R., Ainscough R., Alexandersson M., An P., et al. Nature. 2002;420:520–562. - PubMed
    1. Margulies E. H., Blanchette M., Haussler D., Green E. D., NISC Comparative Sequencing Program Genome Res. 2003;13:2507–2518. - PMC - PubMed
    1. Woolfe A., Goodson M., Goode D. K., Snell P., McEwen G. K., Vavouri T., Smith S. F., North P., Callaway H., Kelly K., et al. PLoS Biol. 2005;3:e7. - PMC - PubMed
    1. Bejerano G., Haussler D., Blanchette M. Bioinformatics. 2004;20(Suppl. 1):I40–I48. - PubMed
    1. Smit A. F. A., Hubley R., Green P. (1996–2004) repeatmasker Open-3.0, available at www.repeatmasker.org.

Publication types

MeSH terms

Substances

LinkOut - more resources