Ensembl 2004 (original) (raw)
Journal Article
,
*To whom correspondence should be addressed. Tel: +44 1223 494983; Fax: +44 1223 494919; Email: th@sanger.ac.u k
Search for other works by this author on:
,
Search for other works by this author on:
,
Search for other works by this author on:
,
Search for other works by this author on:
,
Search for other works by this author on:
,
Search for other works by this author on:
,
Search for other works by this author on:
,
Search for other works by this author on:
,
Search for other works by this author on:
,
Search for other works by this author on:
Published:
01 January 2004
Cite
E. Birney, D. Andrews, P. Bevan, M. Caccamo, G. Cameron, Y. Chen, L. Clarke, G. Coates, T. Cox, J. Cuff, V. Curwen, T. Cutts, T. Down, R. Durbin, E. Eyras, X. M. Fernandez‐Suarez, P. Gane, B. Gibbins, J. Gilbert, M. Hammond, H. Hotz, V. Iyer, A. Kahari, K. Jekosch, A. Kasprzyk, D. Keefe, S. Keenan, H. Lehvaslaiho, G. McVicker, C. Melsopp, P. Meidl, E. Mongin, R. Pettett, S. Potter, G. Proctor, M. Rae, S. Searle, G. Slater, D. Smedley, J. Smith, W. Spooner, A. Stabenau, J. Stalker, R. Storey, A. Ureta‐Vidal, C. Woodwark, M. Clamp, T. Hubbard, Ensembl 2004, Nucleic Acids Research, Volume 32, Issue suppl_1, 1 January 2004, Pages D468–D470, https://doi.org/10.1093/nar/gkh038
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
The Ensembl ( http://www.ensembl.org/ ) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.
Received September 16, 2003; Accepted September 18, 2003
INTRODUCTION
Genome sequences provide a natural framework about which to organize biological data. In the short time in which they have been available, genome databases have proved invaluable resources to researchers. Ensembl provides one of the most popular sources of automatic analysis and integration of large genome sequence data and is a joint project between the EBI and the Sanger Institute. It now contains nine genomes: five vertebrates: human, mouse, rat, fugu, zebrafish; two worms: Caenorhabditis briggsae and Caenorhabditis elegans and two insects: Drosophila melanogaster and Anopheles gambiae . Ensembl has been involved in the continued analysis of human data, analysis of the mouse genome ( 1 ), analysis of the A.gambiae genome ( 2 ) and the C.briggsae genome. Ensembl gene predictions have also formed the core set of annotations for the forthcoming rat genome analysis. Ensembl remains an entirely open project with all data freely available and code openly licensed. Ensembl has developed a strong developer network of users in both academia and industry and is being installed both to mirror Ensembl generated data and to be used as a software foundation for user projects. Several papers describing specific aspects of Ensembl have recently been submitted ( 3 – 6 ). This paper briefly outlines some of the developments of the project since the report last year ( 7 ).
NEW DEVELOPMENTS
Regular update cycle
To streamline the handling of this ever changing and increasing amount of data, from February 2003, Ensembl adopted a monthly release cycle, allowing improvements to the web interface and database schema to be released monthly, with new data being incorporated as it became available. Database dumps and flat files are released in sync with updates to the website.
Pre‐ensembl website
A full Ensembl annotation of a genome takes some weeks to complete. To provide users with immediate access to newly released genome assemblies Ensembl now offers a pre‐ensembl website ( http://pre.ensembl.org/ ) with limited functionality. This can be made available only a few days after the release of the genome and provides BLAST and SSAHA searching, placement of all known proteins, repeat masking and ab initio gene predictions.
Otter: an extended Ensembl schema for gene curation
During the year, Ensembl developed a new software component called Otter. Otter is an Ensembl database, but with an extended schema and an associated client/server system to support manual gene annotation. The Sanger Institute vertebrate annotation system is being migrated to use Otter, which will then put both automatic (Ensembl) and manual annotation under a single software framework and help greatly with subsequent data integration. The Otter server communicates with annotation clients via an XML format, which allows easy exchange and verification of annotation generated with different systems.
The Apollo genome browser ( 4 ), a GMOD component ( http://www.gmod.org/ ) under joint development by Ensembl and the Berkeley Drosophila genome project ( http://www. bdgp.org/ ), can be used as an annotation client for Otter. Apollo has also been extended to display data from DAS (distributed annotation system) servers. As an editor, Apollo has the advantage of being able to view and edit annotation in a comparative genomic context: by connecting to two Otter servers (e.g. human and mouse) and an Ensembl compara database containing pre‐calculated synteny information between the two genomes, it is possible to view annotation for both genomes and edit each in the context of the synteny with the other.
ENHANCEMENTS
Other than these new developments, there have been continuous enhancements to existing features of Ensembl over the year. Users are recommended to read the What’s new pages accompanying every release as user interface improvements are frequently subtle, but can save researchers considerable time. Some of the more significant improvements are listed here.
Ensembl genome annotation and comparative analysis
The quality of the annotation produced by the core automatic gene building system has continued to improve, with builds delivered on seven genome assemblies during the year. The most recent is the first version of the finished human genome sequence (NCBI33) announced in April, which also has pseudogenes automatically predicted. In parallel with gene building, comparative analysis is now routinely carried out for each new assembly. DNA synteny is generated between human, mouse and rat and putative gene orthologues between all five vertebrates and between each of the two worms and insects are automatically generated.
Ensembl website
Last year’s move to the new schema enabled the development of significant enhancements to the Ensembl webviews. These include the addition of a fourth basepair level panel to Contigview, showing nucleotide, six frame amino acid translation and restriction enzyme site features. Additional pre‐processing of SNP data during the building of the Ensembl‐lite database (a denormalized database to speed web access), with respect to other annotation, has allowed Contigview, Transview and Protview to be extended to show SNPs against transcripts and their protein products, including labelling of synonymous and non‐synonymous coding SNPs. Other enhancements to Contigview include labelled syntenic blocks shown on the overview panel and access to a new interface, Dotterview, from DNA conservation tracks on the detailed view panel. Dotterview is a web interface to the program Dotter, showing a dotplot of DNA similarity by default over a 10 kb window in two genomes, with Ensembl annotation. The interface for adding DAS ( 8 ) sources to Contigview has continued to be developed, giving the user much greater control over display of each source.
EnsemblMart: data mining for genomes
Ensembl has continued to import new externally generated data sets and resources into its system. These are frequently available in contigview via the DAS source menu; however, many are also being incorporated into EnsemblMart as additional data mining indicies. Examples include the STACK expression database eVOC nomenclature (collaboration with SANBI); rat QTLs and microarray identifiers from Affymetrix and others. All of these data types are queryable via the Mart data mining interface, which has increased substantially in functionality over the year and now has its own ‘What’s new’ web pages and includes such functionality as integration with the ArrayExpress microarray repository at EBI.
Ensembl software system
The flexibility of components of the Ensembl software system are increasingly leading to their reuse elsewhere. Within the Sanger Institute alone, the Ensembl pipeline is being used to support gene curation by both the Wormbase and Havana (vertebrate annotation) groups. Havana is also in the process of making use of the Otter database for storing its gene annotation. The Ensembl website code has been reused to power the Vega website ( http://vega.sanger.ac.uk/ ), which shows curated annotation of vertebrate genomes collected from a number of annotation groups into a single database. The fact that Ensembl data are also being served via DAS servers ( 8 ) is encouraging data to be combined in novel ways to provide specialist data displays. The website code has already been reused to build Contigview‐like webviews of a virtual database composed entirely of different DAS sources.
FUTURE DIRECTIONS
Ensembl remains focused on providing a genome information infrastructure of use to many researchers, principally via the web. As well as providing the baseline annotation for a number of genomes, Ensembl is continuously trying to improve all aspects of its work, from software engineering through to data analysis. 2004 promises a number of new genomes (e.g. chicken, chimp and honey bee) but also continued technology and presentation improvements, such as new views of cross‐species data, organized around the putative gene orthologues predicted by the comparative analysis pipeline.
CONTACTING ENSEMBL
Ensembl is a joint project of the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute (WTSI), both of which are located on the Wellcome Trust Genome Campus, Cambridge, UK. To receive announcements about updates, subscribe to the ‘announce’ mailing list: majordomo@ebi.ac.uk ‘subscribe ensembl‐announce’. To follow the day‐to‐day development of Ensembl, subscribe to the ‘development’ mailing list: majordomo@ebi.ac.uk ‘subscribe ensembl‐dev’. Requests for information and support can be sent to helpdesk@ensembl.org , which is a fully supported helpdesk. Extensive additional documentation can be found on the Ensembl website, including installation guides and tutorials, about using both the software system and the web interface.
ACKNOWLEDGEMENTS
We are grateful to users of our website and the developers on our mailing lists for much useful feedback and discussion. The Ensembl project is funded principally by the Wellcome Trust with additional funding from EMBL and NIH‐NIAID.
References
Waterston,R.H., Lindblad‐Toh,K., Birney,E., Rogers,J., Abril,J.F., Agarwal,P., Agarwala,R., Ainscough,R., Alexandersson,M., An,P. et al. (
2002
) Initial sequencing and comparative analysis of the mouse genome.
Nature
,
420
,
520
–562.
Holt,R.A., Subramanian,G.M., Halpern,A., Sutton,G.G., Charlab,R., Nusskern,D.R., Wincker,P., Clark,A.G., Ribeiro,J.M., Wides,R. et al. (
2002
) The genome sequence of the malaria mosquito Anopheles gambiae .
Science
,
298
,
129
–149.
Birney,E., Clamp,M.E. and Hubbard,T.J. (
2002
) Databases and tools for browsing genomes.
Annu. Rev. Genom. Hum. Genet.
,
3
,
293
–310.
Lewis,S.E., Searle,S.M., Harris,N., Gibson,M., Lyer,V., Richter,J., Wiel,C., Bayraktaroglir,L., Birney,E., Crosby,M.A. et al. (
2002
) Apollo: a sequence annotation editor.
Genome Biol.
,
3
, RESEARCH0082.
Hoon,S., Ratnapu,K.K., Chia,J.M., Kumarasamy,B., Juguang,X., Clamp,M., Stabenau,A., Potter,S., Clarke,L. and Stupka,E. (
2003
) Biopipe: a flexible framework for protocol‐based bioinformatics analysis.
Genome Res.
,
13
,
1904
–1915.
Clamp,M. (2003) The Jalview Java Alignment Editor.
Bioinformatics
, in press.
Clamp,M., Andrews,D., Barker,D., Bevan,P., Cameron,G., Chen,Y., Clark,L., Cox,T., Cuff,J., Curwen,V. et al. (
2003
) Ensembl 2002: accommodating comparative genomics.
Nucleic Acids Res.
,
31
,
38
–42.
Dowell,R.D., Jokerst,R.M., Day,A., Eddy,S.R. and Stein,L. (
2001
) The Distributed Annotation System.
BMC Bioinformatics
,
2
,
7
.
Oxford University Press
I agree to the terms and conditions. You must accept the terms and conditions.
Submit a comment
Name
Affiliations
Comment title
Comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.
Citations
Views
Altmetric
Metrics
Total Views 1,831
1,412 Pageviews
419 PDF Downloads
Since 1/1/2017
Month: | Total Views: |
---|---|
January 2017 | 1 |
February 2017 | 2 |
April 2017 | 1 |
May 2017 | 2 |
July 2017 | 3 |
August 2017 | 2 |
October 2017 | 4 |
November 2017 | 7 |
December 2017 | 18 |
January 2018 | 33 |
February 2018 | 27 |
March 2018 | 25 |
April 2018 | 34 |
May 2018 | 18 |
June 2018 | 18 |
July 2018 | 21 |
August 2018 | 21 |
September 2018 | 24 |
October 2018 | 12 |
November 2018 | 10 |
December 2018 | 14 |
January 2019 | 12 |
February 2019 | 15 |
March 2019 | 19 |
April 2019 | 26 |
May 2019 | 28 |
June 2019 | 21 |
July 2019 | 28 |
August 2019 | 41 |
September 2019 | 29 |
October 2019 | 32 |
November 2019 | 9 |
December 2019 | 33 |
January 2020 | 13 |
February 2020 | 15 |
March 2020 | 19 |
April 2020 | 18 |
May 2020 | 6 |
June 2020 | 10 |
July 2020 | 13 |
August 2020 | 24 |
September 2020 | 13 |
October 2020 | 12 |
November 2020 | 10 |
December 2020 | 9 |
January 2021 | 12 |
February 2021 | 14 |
March 2021 | 23 |
April 2021 | 18 |
May 2021 | 14 |
June 2021 | 21 |
July 2021 | 27 |
August 2021 | 23 |
September 2021 | 23 |
October 2021 | 22 |
November 2021 | 7 |
December 2021 | 14 |
January 2022 | 20 |
February 2022 | 13 |
March 2022 | 19 |
April 2022 | 21 |
May 2022 | 18 |
June 2022 | 25 |
July 2022 | 35 |
August 2022 | 20 |
September 2022 | 33 |
October 2022 | 45 |
November 2022 | 14 |
December 2022 | 34 |
January 2023 | 19 |
February 2023 | 9 |
March 2023 | 24 |
April 2023 | 32 |
May 2023 | 44 |
June 2023 | 12 |
July 2023 | 11 |
August 2023 | 25 |
September 2023 | 19 |
October 2023 | 30 |
November 2023 | 24 |
December 2023 | 21 |
January 2024 | 16 |
February 2024 | 27 |
March 2024 | 33 |
April 2024 | 27 |
May 2024 | 88 |
June 2024 | 16 |
July 2024 | 29 |
August 2024 | 20 |
September 2024 | 38 |
Citations
145 Web of Science
×
Email alerts
Citing articles via
More from Oxford Academic