Mercator/MAVID CAF1 Alignments (original) (raw)

Description

Multiple whole-genome alignments of Drosophila species were generated by Mercator (an orthology mapping program) and MAVID (a multiple alignment program). These alignments were engineered by Anat Caspi.

Alignments

Click on a node to download the alignment of its descendants. Click on a species to download its pairwise alignment with D. melanogaster.

File formats

Orthology Maps

The file "map" gives a symmetric, 1-to-1 mapping (indicative of monotopoorthology) between regions in the different genomes. Lines in the map files are of the form:

[Segment #] [Chrom] [Start] [End] [Strand] ...

where the last 4 fields are repeated for each genome in the map. The fields are tab-delimited. Coordinates are 0-based and half-open (the end coordinate is one more than the coordinate of the last base included in the segment). Pieces for which no orthologous region could be identified in one of the genomes have "NA" in the fields for the appropriate genomes. The order of the genomes in each line is given by the order of the genomes in the name of the map (also given in the file "genomes").

AGP Files

Mercator (the orthology mapping program) assembles draft genomes during the construction of the orthology map. Therefore, for draft genomes, the coordinates given in the map file are in terms of the assembled contigs. For each draft genome assembled in this way, and for a given map, there is a corresponding AGP file specifying the mapping between the original sequence contigs/scaffolds and the Mercator assembled contigs. A description of the AGP file format can be found at NCBIand UCSC.

Alignments

Alignments are provided for each colinear orthologous segment set identified by the map. The multiple alignment for segment number_n_ is given in a single multi-fasta file "n/mavid.mfa", which contains a record for each genome that is part of the segment. Note that the order of the records in the multi-fasta file does not necessarily correspond with the order in which the genomes are given in the map file. The title of each record in the multi-fasta file is the genome from which the sequence for that record is obtained.