Whole-genome analysis of Alu repeat elements reveals complex evolutionary history - PubMed (original) (raw)

Whole-genome analysis of Alu repeat elements reveals complex evolutionary history

Alkes L Price et al. Genome Res. 2004 Nov.

Abstract

Alu repeats are the most abundant family of repeats in the human genome, with over 1 million copies comprising 10% of the genome. They have been implicated in human genetic disease and in the enrichment of gene-rich segmental duplications in the human genome, and they form a rich fossil record of primate and human history. Alu repeat elements are believed to have arisen from the replication of a small number of source elements, whose evolution over time gives rise to the 31 Alu subfamilies currently reported in Repbase Update. We apply a novel method to identify and statistically validate 213 Alu subfamilies. We build an evolutionary tree of these subfamilies and conclude that the history of Alu evolution is more complex than previous studies had indicated.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Applicability of _k_-means clustering to different kinds of clustering problems. Disjoint clusters of similar size are easily identified (A). Small subfamilies nested inside large subfamilies, a typical scenario in Alu repeat subfamilies, are not easily identified, because there is a tendency to split off a larger cluster (B) instead of identifying the nested subfamily (C).

Figure 2.

Figure 2.

Aligned consensus sequences of selected subfamilies. (Top) The consensus sequence of the entire Alu family, with positions labeled from 1 to 282. (Middle) The consensus sequences of six Alu subfamilies we identified that are currently reported in Repbase Update: _Alu_Jo, _Alu_Sx, _Alu_Sq, _Alu_Sp, _Alu_Y, and _Alu_Ya5; the few discrepancies between our consensus sequences and the consensus sequences reported in Repbase Update occur mostly at CpG dinucleotide positions, which are ill-determined because of frequent mutation. (Bottom) The consensus sequences of six Alu subfamilies we identified that are not currently reported in Repbase Update: _Alu_Sx_3, _Alu_Sx_5, _Alu_Sq_3, _Alu_Sg_4, _Alu_Sc_8, and _Alu_Y_8.

Figure 3.

Figure 3.

Evolutionary tree of the 31 subfamilies currently reported in Repbase Update. (Large nodes) Subfamilies with more than 10,000 elements; (medium nodes) 1000 to 10,000 elements; (small nodes) less than 1000 elements. Each of the 6 Repbase Update subfamilies listed in Figure 2 is labeled. The _Alu_J, _Alu_S, and _Alu_Y classes of subfamilies are contained in boxes.

Figure 4.

Figure 4.

Evolutionary tree of the 213 subfamilies we identified. (Large nodes) Subfamilies with more than 10,000 elements; (medium nodes) 1000 to 10,000 elements; (small nodes) less than 1000 elements. Subfamilies listed in Repbase Update are colored blue, and the 6 novel subfamilies listed in Figure 2 are colored red. Each of the subfamilies listed in Figure 2 is labeled. A rendition of this tree with every node labeled is available in the Supplementary materials online. The _Alu_J, _Alu_S, and _Alu_Y classes of subfamilies are contained in boxes; not all subfamilies fit into one of these classes. A timeline roughly depicting the average divergence of each subfamily from its consensus sequence and the approximate age obtained by applying a constant scaling factor of 4 million years per 1% divergence from consensus sequence are included at right.

References

    1. Arndt, P.F., Petrov, D.A., and Hwa, T. 2003. Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation. Mol. Biol. Evol. 20: 1887-1896. - PubMed
    1. Bailey, J.A., Liu, G., and Eichler, E.E. 2003. An Alu transposition model for the origin and expansion of human segmental duplications. Am. J. Hum. Genet. 73: 823-834. - PMC - PubMed
    1. Bao, Z. and Eddy, S.R. 2002. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 8: 1269-1276. - PMC - PubMed
    1. Batzer, M.A. and Deininger, P.L. 1991. A human-specific subfamily of Alu sequences. Genomics 9: 481-487. - PubMed
    1. ———. 2002. Alu repeats and human genomic diversity. Nat. Rev. Genet. 3: 370-379. - PubMed

WEB SITE REFERENCES

    1. http://repeatmasker.org; RepeatMasker.
    1. http://www.cs.ucsd.edu/~aprice/alu.html; implementation of our algorithm.

Publication types

MeSH terms

LinkOut - more resources