Aligning multiple genomic sequences with the threaded blockset aligner - PubMed (original) (raw)
W James Kent, Cathy Riemer, Laura Elnitski, Arian F A Smit, Krishna M Roskin, Robert Baertsch, Kate Rosenbloom, Hiram Clawson, Eric D Green, David Haussler, Webb Miller
Affiliations
- PMID: 15060014
- PMCID: PMC383317
- DOI: 10.1101/gr.1933104
Aligning multiple genomic sequences with the threaded blockset aligner
Mathieu Blanchette et al. Genome Res. 2004 Apr.
Abstract
We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.
Figures
Figure 1
(A) Blocks (alignments) of a hypothetical threaded blockset for sequences h (400 bp), m (400 bp) and r (350 bp). Only the range of positions in each alignment is given. (B) Projection of the threaded blockset onto m.
Figure 2
(A) Alignments between the chloroplast genomes of Arabidopsis thaliana and Oenothera elata (evening primrose). Lines running from lower left to upper right indicate positions of matches on the forward strand (relative to the GenBank entries, NC_000932 and OEL271079, respectively), and lines running from upper left to lower right indicate matches in reverse complement. The alignments were computed and displayed by programs used by the PipMaker Web server (Schwartz et al. 2000). (B) Blocks of a threaded blockset for the chloroplast genomes of Arabidopsis and evening primrose.
Figure 3
A threaded blockset for vertebrate HoxA regions, displayed in our interactive blockset viewer Gmaj. (A) The red circle marks a position of interest where the tilapia reference sequence aligns with human. The block containing this position is highlighted in red in all of the alignment panels. Color underlays are blue for exons in the reference sequence and yellow for introns, and the exons are also represented as icons above the alignments. At the top of the Gmaj window, two status lines describe the positions of the mouse pointer and the red circle, respectively. Individual nucleotides for the selected block are displayed in the bottom pane, with the marked position highlighted. (B) The same region projected onto the human sequence. The underlays for human include (green) for EST evidence, (dark blue) for antisense RNA, and (red) for coding sequences. The conserved element from A is part of an alternative 5′-end identified by homology to a human EST from TIGR.
Figure 4
(A) Accuracy of the multiple alignments produced by different aligners on a set of nine simulated mammalian sequences of length ∼50 kb, as measured on the basis of the pairwise alignments induced by different pairs of species. The scores reported are the average of 50 simulation experiments. See the Methods section (Supplemental material) for an explanation of the R parameter. (B) Accuracy of the multiple alignments produced by different aligners on simulated human, mouse, and rat sequences of length ∼50 kb, as measured on the basis of the pairwise alignments induced by different pairs of species. The scores reported are the average of 50 simulation experiments.
Figure 5
Pictorial representation of an application of MULTIZ. M is a human-ref blockset of human, mouse, and rat, whereas N is a cow-ref blockset of cow and dog. MULTIZ uses a pairwise human-ref blockset, G, of human and cow to guide the aligning process. The output is a human-ref blockset of human, mouse, rat, cow, and dog. The reference sequence for each blockset is indicated by capital letters.
Figure 6
UCSC Genome Browser display of HUMOR alignments. (A) Ribosomal protein RPL31. The human/mouse/rat track shows the MULTIZ score normalized as described in the text. The high conservation of exons relative to introns is typical of many genes. (B) Transcription Factor FOS. In highly regulated genes such as this one, it is not unusual to find extensive conservation outside of protein-coding exons. (C) Closeup of a poorly conserved part of a RPL31 intron. When the display is zoomed in close enough, the base-by-base alignment is displayed as well as the score graph. Because the alignment is projected onto the reference sequence, a “Hidden Gaps” row indicates areas where in the full alignment there would be dashes in the reference sequence row. Clicking on the human/mouse/rat track takes you to a details page that displays the full alignment. (D) Closeup of an exon/intron boundary in FOS. The canonical “GT” 5′ consensus sequence is usually conserved, but then conservation falls off for the rest of the intron.
Figure 7
The TBA implementation.
Similar articles
- MAVID: constrained ancestral alignment of multiple sequences.
Bray N, Pachter L. Bray N, et al. Genome Res. 2004 Apr;14(4):693-9. doi: 10.1101/gr.1960404. Genome Res. 2004. PMID: 15060012 Free PMC article. - How accurately is ncRNA aligned within whole-genome multiple alignments?
Wang AX, Ruzzo WL, Tompa M. Wang AX, et al. BMC Bioinformatics. 2007 Oct 26;8:417. doi: 10.1186/1471-2105-8-417. BMC Bioinformatics. 2007. PMID: 17963514 Free PMC article. - GS-Aligner: a novel tool for aligning genomic sequences using bit-level operations.
Shih AC, Li WH. Shih AC, et al. Mol Biol Evol. 2003 Aug;20(8):1299-309. doi: 10.1093/molbev/msg139. Epub 2003 May 30. Mol Biol Evol. 2003. PMID: 12777500 - Computation and analysis of genomic multi-sequence alignments.
Blanchette M. Blanchette M. Annu Rev Genomics Hum Genet. 2007;8:193-213. doi: 10.1146/annurev.genom.8.080706.092300. Annu Rev Genomics Hum Genet. 2007. PMID: 17489682 Review. - Recent developments and future directions in computational genomics.
Tsoka S, Ouzounis CA. Tsoka S, et al. FEBS Lett. 2000 Aug 25;480(1):42-8. doi: 10.1016/s0014-5793(00)01776-2. FEBS Lett. 2000. PMID: 10967327 Review.
Cited by
- Sequencing the orthologs of human autosomal forensic short tandem repeats provides individual- and species-level identification in African great apes.
Fedele E, Wetton JH, Jobling MA. Fedele E, et al. BMC Ecol Evol. 2024 Oct 31;24(1):134. doi: 10.1186/s12862-024-02324-0. BMC Ecol Evol. 2024. PMID: 39482599 Free PMC article. - Population genomics and transcriptional consequences of regulatory motif variation in globally diverse Saccharomyces cerevisiae strains.
Connelly CF, Skelly DA, Dunham MJ, Akey JM. Connelly CF, et al. Mol Biol Evol. 2013 Jul;30(7):1605-13. doi: 10.1093/molbev/mst073. Epub 2013 Apr 25. Mol Biol Evol. 2013. PMID: 23619145 Free PMC article. - Exome sequencing identified a missense mutation of EPS8L3 in Marie Unna hereditary hypotrichosis.
Zhang X, Guo BR, Cai LQ, Jiang T, Sun LD, Cui Y, Hu JC, Zhu J, Chen G, Tang XF, Sun GQ, Tang HY, Liu Y, Li M, Li QB, Cheng H, Gao M, Li P, Yang X, Zuo XB, Zheng XD, Wang PG, Wang J, Wang J, Liu JJ, Yang S, Li YR, Zhang XJ. Zhang X, et al. J Med Genet. 2012 Dec;49(12):727-30. doi: 10.1136/jmedgenet-2012-101134. Epub 2012 Oct 25. J Med Genet. 2012. PMID: 23099647 Free PMC article. - Relaxing the Molecular Clock to Different Degrees for Different Substitution Types.
Lee HJ, Rodrigue N, Thorne JL. Lee HJ, et al. Mol Biol Evol. 2015 Aug;32(8):1948-61. doi: 10.1093/molbev/msv099. Epub 2015 Apr 29. Mol Biol Evol. 2015. PMID: 25931515 Free PMC article. - Identification and computational analysis of gene regulatory elements.
Taher L, Narlikar L, Ovcharenko I. Taher L, et al. Cold Spring Harb Protoc. 2015 Jan 5;2015(1):pdb.top083642. doi: 10.1101/pdb.top083642. Cold Spring Harb Protoc. 2015. PMID: 25561628 Free PMC article. Review.
References
- Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301-1310. - PubMed
- Brudno, M. and Morgenstern, B. 2002. Fast and sensitive alignment of large genomic sequences. In Proceedings of the IEEE Computer Society Bioinformatics Conference, pp. 138-150. IEEE Press. - PubMed
- Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835-847. - PubMed
WEB SITE REFERENCES
- http://bio.cse.psu.edu/; TBA, simulated test data, and the Gmaj visualization tool.
- http://genome.ucsc.edu; MULTIZ and HUMOR alignments.
Publication types
MeSH terms
Substances
Grants and funding
- HG-02238/HG/NHGRI NIH HHS/United States
- P41 HG002371/HG/NHGRI NIH HHS/United States
- 1P41HG02371/HG/NHGRI NIH HHS/United States
- F32 HG002325/HG/NHGRI NIH HHS/United States
- HG02325/HG/NHGRI NIH HHS/United States
- R01 HG002238/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources