Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci (original) (raw)
- Michael A. Chapman1,
- Ian J. Donaldson1,
- James Gilbert2,
- Darren Grafham2,
- Jane Rogers2,
- Anthony R. Green1, and
- Berthold Göttgens1,3
- 1 Cambridge Institute for Medical Research, Cambridge, CB2 2XY, UK
- 2 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
Abstract
Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments.
Footnotes
[Supplemental data is available at www.genome.org and http://hscl.cimr.cam.ac.uk/supplementary\_data.html. DNA sequence as described in the paper has been deposited in the GenBank database under accession no. AL731652. The following individuals kindly supplied reagents, samples, or unpublished information as indicated in the paper: R. Li, P. de Jong, and R. Huss.]
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1759004\. Article published online before print in January 2004.
↵3 Corresponding author. E-MAIL bg200{at}cam.ac.uk; FAX44 1223-336827.
- Accepted November 24, 2003.
- Received July 17, 2003.
Cold Spring Harbor Laboratory Press