Aligning multiple genomic sequences with the threaded blockset aligner - PubMed (original) (raw)
W James Kent, Cathy Riemer, Laura Elnitski, Arian F A Smit, Krishna M Roskin, Robert Baertsch, Kate Rosenbloom, Hiram Clawson, Eric D Green, David Haussler, Webb Miller
Affiliations
- PMID: 15060014
- PMCID: PMC383317
- DOI: 10.1101/gr.1933104
Aligning multiple genomic sequences with the threaded blockset aligner
Mathieu Blanchette et al. Genome Res. 2004 Apr.
Abstract
We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.
Figures
Figure 1
(A) Blocks (alignments) of a hypothetical threaded blockset for sequences h (400 bp), m (400 bp) and r (350 bp). Only the range of positions in each alignment is given. (B) Projection of the threaded blockset onto m.
Figure 2
(A) Alignments between the chloroplast genomes of Arabidopsis thaliana and Oenothera elata (evening primrose). Lines running from lower left to upper right indicate positions of matches on the forward strand (relative to the GenBank entries, NC_000932 and OEL271079, respectively), and lines running from upper left to lower right indicate matches in reverse complement. The alignments were computed and displayed by programs used by the PipMaker Web server (Schwartz et al. 2000). (B) Blocks of a threaded blockset for the chloroplast genomes of Arabidopsis and evening primrose.
Figure 3
A threaded blockset for vertebrate HoxA regions, displayed in our interactive blockset viewer Gmaj. (A) The red circle marks a position of interest where the tilapia reference sequence aligns with human. The block containing this position is highlighted in red in all of the alignment panels. Color underlays are blue for exons in the reference sequence and yellow for introns, and the exons are also represented as icons above the alignments. At the top of the Gmaj window, two status lines describe the positions of the mouse pointer and the red circle, respectively. Individual nucleotides for the selected block are displayed in the bottom pane, with the marked position highlighted. (B) The same region projected onto the human sequence. The underlays for human include (green) for EST evidence, (dark blue) for antisense RNA, and (red) for coding sequences. The conserved element from A is part of an alternative 5′-end identified by homology to a human EST from TIGR.
Figure 4
(A) Accuracy of the multiple alignments produced by different aligners on a set of nine simulated mammalian sequences of length ∼50 kb, as measured on the basis of the pairwise alignments induced by different pairs of species. The scores reported are the average of 50 simulation experiments. See the Methods section (Supplemental material) for an explanation of the R parameter. (B) Accuracy of the multiple alignments produced by different aligners on simulated human, mouse, and rat sequences of length ∼50 kb, as measured on the basis of the pairwise alignments induced by different pairs of species. The scores reported are the average of 50 simulation experiments.
Figure 5
Pictorial representation of an application of MULTIZ. M is a human-ref blockset of human, mouse, and rat, whereas N is a cow-ref blockset of cow and dog. MULTIZ uses a pairwise human-ref blockset, G, of human and cow to guide the aligning process. The output is a human-ref blockset of human, mouse, rat, cow, and dog. The reference sequence for each blockset is indicated by capital letters.
Figure 6
UCSC Genome Browser display of HUMOR alignments. (A) Ribosomal protein RPL31. The human/mouse/rat track shows the MULTIZ score normalized as described in the text. The high conservation of exons relative to introns is typical of many genes. (B) Transcription Factor FOS. In highly regulated genes such as this one, it is not unusual to find extensive conservation outside of protein-coding exons. (C) Closeup of a poorly conserved part of a RPL31 intron. When the display is zoomed in close enough, the base-by-base alignment is displayed as well as the score graph. Because the alignment is projected onto the reference sequence, a “Hidden Gaps” row indicates areas where in the full alignment there would be dashes in the reference sequence row. Clicking on the human/mouse/rat track takes you to a details page that displays the full alignment. (D) Closeup of an exon/intron boundary in FOS. The canonical “GT” 5′ consensus sequence is usually conserved, but then conservation falls off for the rest of the intron.
Figure 7
The TBA implementation.
Similar articles
- MAVID: constrained ancestral alignment of multiple sequences.
Bray N, Pachter L. Bray N, et al. Genome Res. 2004 Apr;14(4):693-9. doi: 10.1101/gr.1960404. Genome Res. 2004. PMID: 15060012 Free PMC article. - How accurately is ncRNA aligned within whole-genome multiple alignments?
Wang AX, Ruzzo WL, Tompa M. Wang AX, et al. BMC Bioinformatics. 2007 Oct 26;8:417. doi: 10.1186/1471-2105-8-417. BMC Bioinformatics. 2007. PMID: 17963514 Free PMC article. - GS-Aligner: a novel tool for aligning genomic sequences using bit-level operations.
Shih AC, Li WH. Shih AC, et al. Mol Biol Evol. 2003 Aug;20(8):1299-309. doi: 10.1093/molbev/msg139. Epub 2003 May 30. Mol Biol Evol. 2003. PMID: 12777500 - Computation and analysis of genomic multi-sequence alignments.
Blanchette M. Blanchette M. Annu Rev Genomics Hum Genet. 2007;8:193-213. doi: 10.1146/annurev.genom.8.080706.092300. Annu Rev Genomics Hum Genet. 2007. PMID: 17489682 Review. - Recent developments and future directions in computational genomics.
Tsoka S, Ouzounis CA. Tsoka S, et al. FEBS Lett. 2000 Aug 25;480(1):42-8. doi: 10.1016/s0014-5793(00)01776-2. FEBS Lett. 2000. PMID: 10967327 Review.
Cited by
- Discovery of novel microRNA mimic repressors of ribosome biogenesis.
Bryant CJ, McCool MA, Rosado González GT, Abriola L, Surovtseva YV, Baserga SJ. Bryant CJ, et al. Nucleic Acids Res. 2024 Feb 28;52(4):1988-2011. doi: 10.1093/nar/gkad1235. Nucleic Acids Res. 2024. PMID: 38197221 Free PMC article. - RhesusBase: a knowledgebase for the monkey research community.
Zhang SJ, Liu CJ, Shi M, Kong L, Chen JY, Zhou WZ, Zhu X, Yu P, Wang J, Yang X, Hou N, Ye Z, Zhang R, Xiao R, Zhang X, Li CY. Zhang SJ, et al. Nucleic Acids Res. 2013 Jan;41(Database issue):D892-905. doi: 10.1093/nar/gks835. Epub 2012 Sep 10. Nucleic Acids Res. 2013. PMID: 22965133 Free PMC article. - Overcoming NS1-mediated immune antagonism involves both interferon-dependent and independent mechanisms.
Thakar J, Schmid S, Duke JL, García-Sastre A, Kleinstein SH. Thakar J, et al. J Interferon Cytokine Res. 2013 Nov;33(11):700-8. doi: 10.1089/jir.2012.0113. Epub 2013 Jun 17. J Interferon Cytokine Res. 2013. PMID: 23772952 Free PMC article. - YOC, A new strategy for pairwise alignment of collinear genomes.
Uricaru R, Michotey C, Chiapello H, Rivals E. Uricaru R, et al. BMC Bioinformatics. 2015 Apr 2;16(1):111. doi: 10.1186/s12859-015-0530-3. BMC Bioinformatics. 2015. PMID: 25885358 Free PMC article. - Elephant Genomes Reveal Accelerated Evolution in Mechanisms Underlying Disease Defenses.
Tollis M, Ferris E, Campbell MS, Harris VK, Rupp SM, Harrison TM, Kiso WK, Schmitt DL, Garner MM, Aktipis CA, Maley CC, Boddy AM, Yandell M, Gregg C, Schiffman JD, Abegglen LM. Tollis M, et al. Mol Biol Evol. 2021 Aug 23;38(9):3606-3620. doi: 10.1093/molbev/msab127. Mol Biol Evol. 2021. PMID: 33944920 Free PMC article.
References
- Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301-1310. - PubMed
- Brudno, M. and Morgenstern, B. 2002. Fast and sensitive alignment of large genomic sequences. In Proceedings of the IEEE Computer Society Bioinformatics Conference, pp. 138-150. IEEE Press. - PubMed
- Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A vision for the future of genomics research. Nature 422: 835-847. - PubMed
WEB SITE REFERENCES
- http://bio.cse.psu.edu/; TBA, simulated test data, and the Gmaj visualization tool.
- http://genome.ucsc.edu; MULTIZ and HUMOR alignments.
Publication types
MeSH terms
Substances
Grants and funding
- HG-02238/HG/NHGRI NIH HHS/United States
- P41 HG002371/HG/NHGRI NIH HHS/United States
- 1P41HG02371/HG/NHGRI NIH HHS/United States
- F32 HG002325/HG/NHGRI NIH HHS/United States
- HG02325/HG/NHGRI NIH HHS/United States
- R01 HG002238/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources