Ensembl 2005 - PubMed (original) (raw)

. 2005 Jan 1;33(Database issue):D447-53.

doi: 10.1093/nar/gki138.

D Andrews, M Caccamo, G Cameron, Y Chen, M Clamp, L Clarke, G Coates, T Cox, F Cunningham, V Curwen, T Cutts, T Down, R Durbin, X M Fernandez-Suarez, J Gilbert, M Hammond, J Herrero, H Hotz, K Howe, V Iyer, K Jekosch, A Kahari, A Kasprzyk, D Keefe, S Keenan, F Kokocinsci, D London, I Longden, G McVicker, C Melsopp, P Meidl, S Potter, G Proctor, M Rae, D Rios, M Schuster, S Searle, J Severin, G Slater, D Smedley, J Smith, W Spooner, A Stabenau, J Stalker, R Storey, S Trevanion, A Ureta-Vidal, J Vogel, S White, C Woodwark, E Birney

Affiliations

Ensembl 2005

T Hubbard et al. Nucleic Acids Res. 2005.

Abstract

The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased by 7 to 16, with the addition of the six vertebrate genomes of chimpanzee, dog, cow, chicken, tetraodon and frog and the insect genome of honeybee. The majority have been annotated automatically using the Ensembl gene build system, showing its flexibility to reliably annotate a wide variety of genomes. With the increased number of vertebrate genomes, the comparative analysis provided to users has been greatly improved, with new website interfaces allowing annotation of different genomes to be directly compared. The Ensembl software system is being increasingly widely reused in different projects showing the benefits of a completely open approach to software development and distribution.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Screenshot of Ensembl multicontigview. The view shows genome sequence with annotation from human, mouse and chicken, aligned according to DNA–DNA similarity, shown in green. Pairwise similarity is shown between the ‘primary’ genome (human in this case) and each of the other genomes. Menus allow additional genomes to be added to the display. The ‘P’ button allows a different genome to be selected as the primary one. Genes automatically identified as putative orthologues are linked by blue lines. The region shown is centred around the HOXB3 gene in the HOX cluster on human chromosome 17 and is shown to be syntenic with a region on mouse chromosome 11. All Ensembl known gene structures are conserved and have been correctly identified as orthologues. Two novel Ensembl gene structures predicted in mouse are not seen in human. It would be interesting to investigate the corresponding region in human to understand why they were not predicted there. Features such as alignments to cDNAs and proteins can be turned on using the menus to facilitate such a comparison. Putative orthologue prediction and DNA similarity show a much weaker and incomplete link to a region in the chicken genome; however this is on chromosome Un, which is a fake chromosome composed of fragments that could not be mapped onto chromosomes in the current assembly. Whereas the chicken fragment contains HOXB3 and HOXB1, HOXB4 and others are absent. The putative chicken orthologue for human HOXB4 is found in another chicken fragment in the fake Un chromosome (data not shown), suggesting that the chicken equivalent of the human chromosome 17 HOX cluster is fragmented in the current chicken assembly.

Figure 2

Figure 2

Screenshot of Ensembl genesnpview. This new gene-centric view shows in a single display the genomic context of a gene and its surrounding SNPs. The figure shows the region of the human genome around the HOXB3 gene in the HOX cluster on chromosome 17. The display shows three different resolutions: the genes over a 270 kb region are shown around HOXB3; the HOXB3 gene itself (gene id ENSG00000120093) and the HOXB3 transcript (transcript id ENST00000311626) with intragenic sequence and introns truncated so that it is mainly CDS and untranslated region (UTR) sequence that is shown. By default the flanking regions are truncated to 50 bp. This can be changed with the ‘Context’ menu and in this case has been set to 200 bp, revealing six intronic SNPs. It can be seen that the CDS of the transcript includes one known protein domain (the Pfam PF00046 homeobox domain). There are only two SNPs that fall within the CDS and only one of these is non-synonyomous leading to a proline (P) to threonine (T) amino acid change in the second exon as a result of an A to C base change. There is a further flanking SNP (C to T change) in the 3′-UTR. A table immediately below the figure provides more information about the SNPs that intersect the transcript being viewed. The three menu bars ‘SNP class’, ‘Validation’ and ‘SNP type’ allow the SNPs being displayed to be filtered. If there were multiple transcripts for the gene selected, they would each be displayed. The view thus combines data in a single view that can partly be found in contigview, transview, protview and snpview.

References

    1. Birney E., Andrews,D., Bevan,P., Caccamo,M., Cameron,G., Chen,Y., Clarke,L., Coates,G., Cox,T., Cuff,J. et al. (2004) Ensembl 2004. Nucleic Acids Res., 32, D468–D470. - PMC - PubMed
    1. Birney E., Andrews,T.D., Bevan,P., Caccamo,M., Chen,Y., Clarke,L., Coates,G., Cuff,J., Curwen,V., Cutts,T. et al. (2004) An overview of Ensembl. Genome Res., 14, 925–928. - PMC - PubMed
    1. Stabenau A., McVicker,G., Melsopp,C., Proctor,G., Clamp,M. and Birney,E. (2004) The Ensembl core software libraries. Genome Res., 14, 929–933. - PMC - PubMed
    1. Stalker J., Gibbins,B., Meidl,P., Smith,J., Spooner,W., Hotz,H.R. and Cox,A.V. (2004) The Ensembl Web site: mechanics of a genome browser. Genome Res., 14, 951–955. - PMC - PubMed
    1. Potter S.C., Clarke,L., Curwen,V., Keenan,S., Mongin,E., Searle,S.M., Stabenau,A., Storey,R. and Clamp,M. (2004) The Ensembl analysis pipeline. Genome Res., 14, 934–941. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources