Ensembl 2004 (original) (raw)

Journal Article

*To whom correspondence should be addressed. Tel: +44 1223 494983; Fax: +44 1223 494919; Email: th@sanger.ac.u k

Search for other works by this author on:

Published:

01 January 2004

Cite

E. Birney, D. Andrews, P. Bevan, M. Caccamo, G. Cameron, Y. Chen, L. Clarke, G. Coates, T. Cox, J. Cuff, V. Curwen, T. Cutts, T. Down, R. Durbin, E. Eyras, X. M. Fernandez‐Suarez, P. Gane, B. Gibbins, J. Gilbert, M. Hammond, H. Hotz, V. Iyer, A. Kahari, K. Jekosch, A. Kasprzyk, D. Keefe, S. Keenan, H. Lehvaslaiho, G. McVicker, C. Melsopp, P. Meidl, E. Mongin, R. Pettett, S. Potter, G. Proctor, M. Rae, S. Searle, G. Slater, D. Smedley, J. Smith, W. Spooner, A. Stabenau, J. Stalker, R. Storey, A. Ureta‐Vidal, C. Woodwark, M. Clamp, T. Hubbard, Ensembl 2004, Nucleic Acids Research, Volume 32, Issue suppl_1, 1 January 2004, Pages D468–D470, https://doi.org/10.1093/nar/gkh038
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

The Ensembl ( http://www.ensembl.org/ ) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.

Received September 16, 2003; Accepted September 18, 2003

INTRODUCTION

Genome sequences provide a natural framework about which to organize biological data. In the short time in which they have been available, genome databases have proved invaluable resources to researchers. Ensembl provides one of the most popular sources of automatic analysis and integration of large genome sequence data and is a joint project between the EBI and the Sanger Institute. It now contains nine genomes: five vertebrates: human, mouse, rat, fugu, zebrafish; two worms: Caenorhabditis briggsae and Caenorhabditis elegans and two insects: Drosophila melanogaster and Anopheles gambiae . Ensembl has been involved in the continued analysis of human data, analysis of the mouse genome ( 1 ), analysis of the A.gambiae genome ( 2 ) and the C.briggsae genome. Ensembl gene predictions have also formed the core set of annotations for the forthcoming rat genome analysis. Ensembl remains an entirely open project with all data freely available and code openly licensed. Ensembl has developed a strong developer network of users in both academia and industry and is being installed both to mirror Ensembl generated data and to be used as a software foundation for user projects. Several papers describing specific aspects of Ensembl have recently been submitted ( 3 – 6 ). This paper briefly outlines some of the developments of the project since the report last year ( 7 ).

NEW DEVELOPMENTS

Regular update cycle

To streamline the handling of this ever changing and increasing amount of data, from February 2003, Ensembl adopted a monthly release cycle, allowing improvements to the web interface and database schema to be released monthly, with new data being incorporated as it became available. Database dumps and flat files are released in sync with updates to the website.

Pre‐ensembl website

A full Ensembl annotation of a genome takes some weeks to complete. To provide users with immediate access to newly released genome assemblies Ensembl now offers a pre‐ensembl website ( http://pre.ensembl.org/ ) with limited functionality. This can be made available only a few days after the release of the genome and provides BLAST and SSAHA searching, placement of all known proteins, repeat masking and ab initio gene predictions.

Otter: an extended Ensembl schema for gene curation

During the year, Ensembl developed a new software component called Otter. Otter is an Ensembl database, but with an extended schema and an associated client/server system to support manual gene annotation. The Sanger Institute vertebrate annotation system is being migrated to use Otter, which will then put both automatic (Ensembl) and manual annotation under a single software framework and help greatly with subsequent data integration. The Otter server communicates with annotation clients via an XML format, which allows easy exchange and verification of annotation generated with different systems.

The Apollo genome browser ( 4 ), a GMOD component ( http://www.gmod.org/ ) under joint development by Ensembl and the Berkeley Drosophila genome project ( http://www. bdgp.org/ ), can be used as an annotation client for Otter. Apollo has also been extended to display data from DAS (distributed annotation system) servers. As an editor, Apollo has the advantage of being able to view and edit annotation in a comparative genomic context: by connecting to two Otter servers (e.g. human and mouse) and an Ensembl compara database containing pre‐calculated synteny information between the two genomes, it is possible to view annotation for both genomes and edit each in the context of the synteny with the other.

ENHANCEMENTS

Other than these new developments, there have been continuous enhancements to existing features of Ensembl over the year. Users are recommended to read the What’s new pages accompanying every release as user interface improvements are frequently subtle, but can save researchers considerable time. Some of the more significant improvements are listed here.

Ensembl genome annotation and comparative analysis

The quality of the annotation produced by the core automatic gene building system has continued to improve, with builds delivered on seven genome assemblies during the year. The most recent is the first version of the finished human genome sequence (NCBI33) announced in April, which also has pseudogenes automatically predicted. In parallel with gene building, comparative analysis is now routinely carried out for each new assembly. DNA synteny is generated between human, mouse and rat and putative gene orthologues between all five vertebrates and between each of the two worms and insects are automatically generated.

Ensembl website

Last year’s move to the new schema enabled the development of significant enhancements to the Ensembl webviews. These include the addition of a fourth basepair level panel to Contigview, showing nucleotide, six frame amino acid translation and restriction enzyme site features. Additional pre‐processing of SNP data during the building of the Ensembl‐lite database (a denormalized database to speed web access), with respect to other annotation, has allowed Contigview, Transview and Protview to be extended to show SNPs against transcripts and their protein products, including labelling of synonymous and non‐synonymous coding SNPs. Other enhancements to Contigview include labelled syntenic blocks shown on the overview panel and access to a new interface, Dotterview, from DNA conservation tracks on the detailed view panel. Dotterview is a web interface to the program Dotter, showing a dotplot of DNA similarity by default over a 10 kb window in two genomes, with Ensembl annotation. The interface for adding DAS ( 8 ) sources to Contigview has continued to be developed, giving the user much greater control over display of each source.

EnsemblMart: data mining for genomes

Ensembl has continued to import new externally generated data sets and resources into its system. These are frequently available in contigview via the DAS source menu; however, many are also being incorporated into EnsemblMart as additional data mining indicies. Examples include the STACK expression database eVOC nomenclature (collaboration with SANBI); rat QTLs and microarray identifiers from Affymetrix and others. All of these data types are queryable via the Mart data mining interface, which has increased substantially in functionality over the year and now has its own ‘What’s new’ web pages and includes such functionality as integration with the ArrayExpress microarray repository at EBI.

Ensembl software system

The flexibility of components of the Ensembl software system are increasingly leading to their reuse elsewhere. Within the Sanger Institute alone, the Ensembl pipeline is being used to support gene curation by both the Wormbase and Havana (vertebrate annotation) groups. Havana is also in the process of making use of the Otter database for storing its gene annotation. The Ensembl website code has been reused to power the Vega website ( http://vega.sanger.ac.uk/ ), which shows curated annotation of vertebrate genomes collected from a number of annotation groups into a single database. The fact that Ensembl data are also being served via DAS servers ( 8 ) is encouraging data to be combined in novel ways to provide specialist data displays. The website code has already been reused to build Contigview‐like webviews of a virtual database composed entirely of different DAS sources.

FUTURE DIRECTIONS

Ensembl remains focused on providing a genome information infrastructure of use to many researchers, principally via the web. As well as providing the baseline annotation for a number of genomes, Ensembl is continuously trying to improve all aspects of its work, from software engineering through to data analysis. 2004 promises a number of new genomes (e.g. chicken, chimp and honey bee) but also continued technology and presentation improvements, such as new views of cross‐species data, organized around the putative gene orthologues predicted by the comparative analysis pipeline.

CONTACTING ENSEMBL

Ensembl is a joint project of the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute (WTSI), both of which are located on the Wellcome Trust Genome Campus, Cambridge, UK. To receive announcements about updates, subscribe to the ‘announce’ mailing list: majordomo@ebi.ac.uk ‘subscribe ensembl‐announce’. To follow the day‐to‐day development of Ensembl, subscribe to the ‘development’ mailing list: majordomo@ebi.ac.uk ‘subscribe ensembl‐dev’. Requests for information and support can be sent to helpdesk@ensembl.org , which is a fully supported helpdesk. Extensive additional documentation can be found on the Ensembl website, including installation guides and tutorials, about using both the software system and the web interface.

ACKNOWLEDGEMENTS

We are grateful to users of our website and the developers on our mailing lists for much useful feedback and discussion. The Ensembl project is funded principally by the Wellcome Trust with additional funding from EMBL and NIH‐NIAID.

References

Waterston,R.H., Lindblad‐Toh,K., Birney,E., Rogers,J., Abril,J.F., Agarwal,P., Agarwala,R., Ainscough,R., Alexandersson,M., An,P. et al. (

2002

) Initial sequencing and comparative analysis of the mouse genome.

Nature

420

520

–562.

Holt,R.A., Subramanian,G.M., Halpern,A., Sutton,G.G., Charlab,R., Nusskern,D.R., Wincker,P., Clark,A.G., Ribeiro,J.M., Wides,R. et al. (

2002

) The genome sequence of the malaria mosquito Anopheles gambiae .

Science

298

129

–149.

Birney,E., Clamp,M.E. and Hubbard,T.J. (

2002

) Databases and tools for browsing genomes.

Annu. Rev. Genom. Hum. Genet.

293

–310.

Lewis,S.E., Searle,S.M., Harris,N., Gibson,M., Lyer,V., Richter,J., Wiel,C., Bayraktaroglir,L., Birney,E., Crosby,M.A. et al. (

2002

) Apollo: a sequence annotation editor.

Genome Biol.

, RESEARCH0082.

Hoon,S., Ratnapu,K.K., Chia,J.M., Kumarasamy,B., Juguang,X., Clamp,M., Stabenau,A., Potter,S., Clarke,L. and Stupka,E. (

2003

) Biopipe: a flexible framework for protocol‐based bioinformatics analysis.

Genome Res.

1904

–1915.

Clamp,M. (2003) The Jalview Java Alignment Editor.

Bioinformatics

, in press.

Clamp,M., Andrews,D., Barker,D., Bevan,P., Cameron,G., Chen,Y., Clark,L., Cox,T., Cuff,J., Curwen,V. et al. (

2003

) Ensembl 2002: accommodating comparative genomics.

Nucleic Acids Res.

–42.

Dowell,R.D., Jokerst,R.M., Day,A., Eddy,S.R. and Stein,L. (

2001

) The Distributed Annotation System.

BMC Bioinformatics

Oxford University Press

I agree to the terms and conditions. You must accept the terms and conditions.

Submit a comment

Name

Affiliations

Comment title

Comment

You have entered an invalid code

Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.

Citations

Views

Altmetric

Metrics

Total Views 1,831

1,412 Pageviews

419 PDF Downloads

Since 1/1/2017

Month:	Total Views:
January 2017	1
February 2017	2
April 2017	1
May 2017	2
July 2017	3
August 2017	2
October 2017	4
November 2017	7
December 2017	18
January 2018	33
February 2018	27
March 2018	25
April 2018	34
May 2018	18
June 2018	18
July 2018	21
August 2018	21
September 2018	24
October 2018	12
November 2018	10
December 2018	14
January 2019	12
February 2019	15
March 2019	19
April 2019	26
May 2019	28
June 2019	21
July 2019	28
August 2019	41
September 2019	29
October 2019	32
November 2019	9
December 2019	33
January 2020	13
February 2020	15
March 2020	19
April 2020	18
May 2020	6
June 2020	10
July 2020	13
August 2020	24
September 2020	13
October 2020	12
November 2020	10
December 2020	9
January 2021	12
February 2021	14
March 2021	23
April 2021	18
May 2021	14
June 2021	21
July 2021	27
August 2021	23
September 2021	23
October 2021	22
November 2021	7
December 2021	14
January 2022	20
February 2022	13
March 2022	19
April 2022	21
May 2022	18
June 2022	25
July 2022	35
August 2022	20
September 2022	33
October 2022	45
November 2022	14
December 2022	34
January 2023	19
February 2023	9
March 2023	24
April 2023	32
May 2023	44
June 2023	12
July 2023	11
August 2023	25
September 2023	19
October 2023	30
November 2023	24
December 2023	21
January 2024	16
February 2024	27
March 2024	33
April 2024	27
May 2024	88
June 2024	16
July 2024	29
August 2024	20
September 2024	38

Citations

145 Web of Science

Ensembl 2004 (original) (raw)

Cite

Abstract

INTRODUCTION

NEW DEVELOPMENTS

Regular update cycle

Pre‐ensembl website

Otter: an extended Ensembl schema for gene curation

ENHANCEMENTS

Ensembl genome annotation and comparative analysis

Ensembl website

EnsemblMart: data mining for genomes

Ensembl software system

FUTURE DIRECTIONS

CONTACTING ENSEMBL

ACKNOWLEDGEMENTS

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Cited

Ensembl 2004 (original) (raw)

Cite

Abstract

INTRODUCTION

NEW DEVELOPMENTS

Regular update cycle

Pre‐ensembl website

Otter: an extended Ensembl schema for gene curation

ENHANCEMENTS

Ensembl genome annotation and comparative analysis

Ensembl website

EnsemblMart: data mining for genomes

Ensembl software system

FUTURE DIRECTIONS

CONTACTING ENSEMBL

ACKNOWLEDGEMENTS

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited