Genome-scale evolution: reconstructing gene orders in the ancestral species - PubMed (original) (raw)

Genome-scale evolution: reconstructing gene orders in the ancestral species

Guillaume Bourque et al. Genome Res. 2002 Jan.

Abstract

Recent progress in genome-scale sequencing and comparative mapping raises new challenges in studies of genome rearrangements. Although the pairwise genome rearrangement problem is well-studied, algorithms for reconstructing rearrangement scenarios for multiple species are in great need. The previous approaches to multiple genome rearrangement problem were largely based on the breakpoint distance rather than on a more biologically accurate rearrangement (reversal) distance. Another shortcoming of the existing software tools is their inability to analyze rearrangements (inversions, translocations, fusions, and fissions) of multichromosomal genomes. This paper proposes a new multiple genome rearrangement algorithm that is based on the rearrangement (rather than breakpoint) distance and that is applicable to both unichromosomal and multichromosomal genomes. We further apply this algorithm for genome-scale phylogenetic tree reconstruction and deriving ancestral gene orders. In particular, our analysis suggests a new improved rearrangement scenario for a very difficult Campanulaceae cpDNA dataset and a putative rearrangement scenario for human, mouse and cat genomes.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Reversal distance, d(πγ), versus actual number of reversals performed to transform π into γ, where γ is a genome/permutation that evolved from the identity permutation π = 1,2, … ,100 by k random reversals. The simulations were repeated 10 times for every k. We compute the average difference between the reversal distance and the actual number of reversals performed (k).

Figure 2

Figure 2

Comparison of MGR-MEDIAN and GRAPPA (three genomes equidistant from the ancestor). The genomes _G_1, G_2, G_3 are obtained by k reversals each from the ancestral identity permutation 1 2 … n (n = 30 and n = 100). The simulations were repeated 10 times for every ratio #reversals/#markers = 3_k/n. (a) and (b) show the average difference between the number of reversals on the tree recovered by the algorithm and the number of reversals on the actual tree (equal to 3_k). (c) and (d) show the average reversal distance between the solution recovered and the actual ancestor.

Figure 3

Figure 3

Comparison of MGR-MEDIAN and GRAPPA (three genomes nonequidistant from the ancestor). The genomes G_1, G_2, and G_3 are obtained by k, k, and 2_k reversals, respectively, each from the ancestral identity permutation 1 2 … n (n = 30 and n = 100). The simulations were repeated 10 times for every ratio #reversals/#markers = 4_k/n. (a) and (b) show the average difference between the number of reversals on the tree recovered by the algorithm and the number of reversals on the actual tree (equal to 4_k). (c) and (d) show the average reversal distance between the solution recovered and the actual ancestor.

Figure 4

Figure 4

Comparison of MGR and GRAPPA (four genomes). We start from an unrooted tree with four leaves and select one of the two internal nodes to be the identity permutation 1 2 … n (n = 30 and n = 100). We then perform k reversals on each branch of the tree to obtain the genomes _G_1, _G_2, G_3, and G_4 as the four leaves of the tree. The simulations were repeated 10 times for every ratio #reversals/#markers = 5_k/n. (a) and (b) show the average difference between the number of reversals on the tree recovered by the algorithm and the number of reversals on the actual tree (equal to 5_k). (c) and (d) show the average reversal distance between the best (i.e., closest) internal node in the solution recovered and the identity permutation.

Figure 5

Figure 5

Comparison of MGR and GRAPPA (m genomes each with 30 markers). The genomes _G_1,_G_2, … ,G m correspond to a subset of leaves from a complete unrooted binary tree on which we have performed k reversals on each branch. The simulations were repeated 10 times for every m. (a) and (b) show the average difference between the number of reversals on the tree recovered by the algorithm and the number of reversals on the actual tree when k = 2 and k = 3, respectively.

Figure 6

Figure 6

Herpes simplex virus (HSV), Epstein-Barr virus (EBV), and Cytomegalovirus (CMV) gene orders (Hannenhalli et al. 1995) as well as the ancestral gene order (A) and optimal evolutionary scenario recovered by MGR-MEDIAN.

Figure 7

Figure 7

Human, sea urchin, and fruit fly mitochondrial gene order taken from Sankoff et al. (1996). A is the ancestral gene order suggested by MGR-MEDIAN.

Figure 8

Figure 8

Phylogeny of 11 metazoan genomes reconstructed by MGR. The gene order data is taken from the MGA Source Guide compiled by Jeffrey L. Boore. The genomes come from 6 major metazoan groupings: nematodes (NEM), annelids (ANN), mollusks (MOL), arthropods (ART), echinoderms (ECH), and chordates (CHO). Numbers show the number of reversals.

Figure 9

Figure 9

Phylogeny of the Campanulaceae cpDNA dataset as reconstructed by MGR. Numbers show the number of reversals.

Figure 10

Figure 10

Performance of MGR-MC (three multichromosomal genomes equidistant from the ancestor). The ancestral genomes are obtained from the identity permutation 1 2 … n (n = 30 and n = 100) by inserting b chromosomes breaks (b = 2 when n = 30 and b = 9 when n = 100). The genomes _G_1, G_2, and G_3 are obtained by k rearrangements each from the ancestral genomes. Each rearrangement is a reversal/translocation with probability p and a fusion/fission with probability 1 − p. The simulations were repeated 10 times for every ratio #rearrangements/#markers = 3_k/n. We compute the average score difference, which is the difference between the number of rearrangements on the tree recovered by the algorithm and the actual number of rearrangements (equal to 3_k). We also compute the average distance of solution between the solution recovered and the actual ancestor.

Figure 11

Figure 11

Ancestral median for human, mouse, and cat genomes found by MGR-MC. We used the gene order of 114 markers spread over the chromosomes in all three species. The numbers above the chromosomes correspond to these 114 markers, and the numbering is such that the human genome corresponds to the identity permutation broken into 20 pieces. The names below the chromosomes correspond to the name of the markers. We attribute a color to each human chromosome. The color of any marker (in any genome) indicates the human chromosome on which the homolog of this marker lies. Each marker segment is traversed by a diagonal line. These diagonal lines are such that the human chromosomes are traversed from top left to bottom right and are designed to provide visual help to identify where rearrangements occurred. For example, for chromosome X, the gene order of the ancestor coincides with the cat gene order and only differs by one segment consisting of genes 108 and 109 (break in the diagonal line) from the human gene order. The mouse X chromosome is broken into 7 segments compared to the ancestor (shown by seven broken segments of the diagonal line).

Similar articles

Cited by

References

    1. Bafna V, Pevzner P. Sorting by reversals: Genome rearrangements in plant organelles and evolutionary history of X chromosome. Mol Biol Evol. 1995;12:239–246.
    1. Bergeron A. Proceedings of the Twelfth Annual Symposium on Combinatorial Pattern Matching. 2089 of Lecture Notes in Computer Science. 2001. A very elementary presentation of the Hannenhall-Pevzner theory; pp. 106–117. . Jerusalem, Israel. Springer-Verlag, New York.
    1. Berman P, Hannenhalli S. Combinatorial Pattern Matching. Seventh Annual Symposium. Vol. 1075. 1996. Fast sorting by reversal. of Lecture Notes in Computer Science, pp. 168–185. Springer, New York.
    1. Blanchette M, Bourque G, Sankoff D. Breakpoint phylogenies. In: Miyano S, Takagi T, editors. Genome Informatics Workshop (GIW 1997) Tokyo: University Academy Press; 1997. pp. 25–34. - PubMed
    1. Blanchette M, Kunisawa T, Sankoff D. Gene order breakpoint evidence in animal mitochondrial phylogeny. J Mol Evol. 1999;49:193–203. - PubMed

Publication types

MeSH terms

LinkOut - more resources