Ensembl 2012 (original) (raw)

Journal Article

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

*To whom correspondence should be addressed. Tel: +44 1223 492581; Fax:

+44 1223 494494

; Email: flicek@ebi.ac.uk

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

Received:

10 October 2011

Accepted:

17 October 2011

Published:

15 November 2011

Cite

Paul Flicek, M. Ridwan Amode, Daniel Barrell, Kathryn Beal, Simon Brent, Denise Carvalho-Silva, Peter Clapham, Guy Coates, Susan Fairley, Stephen Fitzgerald, Laurent Gil, Leo Gordon, Maurice Hendrix, Thibaut Hourlier, Nathan Johnson, Andreas K. Kähäri, Damian Keefe, Stephen Keenan, Rhoda Kinsella, Monika Komorowska, Gautier Koscielny, Eugene Kulesha, Pontus Larsson, Ian Longden, William McLaren, Matthieu Muffato, Bert Overduin, Miguel Pignatelli, Bethan Pritchard, Harpreet Singh Riat, Graham R. S. Ritchie, Magali Ruffier, Michael Schuster, Daniel Sobral, Y. Amy Tang, Kieron Taylor, Stephen Trevanion, Jana Vandrovcova, Simon White, Mark Wilson, Steven P. Wilder, Bronwen L. Aken, Ewan Birney, Fiona Cunningham, Ian Dunham, Richard Durbin, Xosé M. Fernández-Suarez, Jennifer Harrow, Javier Herrero, Tim J. P. Hubbard, Anne Parker, Glenn Proctor, Giulietta Spudich, Jan Vogel, Andy Yates, Amonida Zadissa, Stephen M. J. Searle, Ensembl 2012, Nucleic Acids Research, Volume 40, Issue D1, 1 January 2012, Pages D84–D90, https://doi.org/10.1093/nar/gkr991
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.

INTRODUCTION

The Ensembl project provides a genome browser at http://www.ensembl.org as well as integrated genome resources. The depth of genome information varies across supported species with the most comprehensive information provided for human, mouse, rat and zebrafish, which are also the most highly accessed genomes. For all species on the main site, we provide comprehensive, evidence-based gene annotations and comparative genomics resources including alignments and homology, orthology and paralogy relationships based on Ensembl GeneTrees (1). We integrate these annotations with a large number of external data sources including InterPro (2), UniProt (3) and Pfam (4). Eighteen of our most popular species also include dedicated variation resources (5) derived from dbSNP (6), DGVa (7) and other sources. The Ensembl regulatory build provides regulatory annotation on the human and mouse genomes and incorporates data from the ENCODE (8) and Roadmap Epigenomics Program (9).

In addition to the data available through the Ensembl website, we provide open access to the Ensembl API (10) and all supporting Ensembl databases to enable flexible, programmatic interaction with our data for use in genomic analysis. Data can also be accessed through the Ensembl BioMart (11,12). We support those who use multiple web-based genome bioinformatics sites by providing links to the UCSC Genome Browser (13) and NCBI's MapViewer (14) on all of our LocationView pages. We also support user data upload and visualization using BAM, BigWig, VCF and other common data formats (see http://www.ensembl.org/info/website/upload/index.html for further information and the most current list of supported upload formats).

Here we highlight some of Ensembl's new features and new data released in the last year. As with previous updates (15,16), we can only describe a subset of the information provided. Further information is available in the documentation section of the Ensembl website, from the Ensembl blog (http://www.ensembl.info) or by contacting helpdesk@ensembl.org.

RESULTS

Ensembl produces approximately five releases each year. Releases are numbered sequentially (September 2011 was release 64) and include newly supported species, new assemblies and new or updated annotations of already supported species. The Ensembl genome browser, code base and other genomic information described in this report are also updated with each release.

Gene annotation and supported species

Over the past year, five new species have been added to Ensembl. As of release 64, two of these species, white-cheeked gibbon and Tasmanian devil, are fully supported on our main site. Tasmanian devil is noteworthy because it was the second species (after zebrafish) in Ensembl to include RNAseq-based gene models. The Atlantic cod (Gadus morhua) was fully annotated and released on our preview site in conjunction with the genome article (17) and will be released on the main site in late 2011 as part of Ensembl release 65. Genome articles have been published for orang-utan (Pongo abelii) (18), anole lizard (Anolis carolinensis) (19) and tamar wallaby (Macropus eugenii) (20). The Nile tilapia (Oreochromis niloticus) and domestic ferret (Mustela putorius furo) are also available on our preview site. In addition to the new species, we released annotation on assemblies for sea lamprey (Petromyzon marinus), Western clawed frog (Xenopus tropicalis), cow (Bos taurus) and microbat (Myotis lucifugus). The microbat assembly has also moved from low to high coverage. An updated chimpanzee assembly, Pan_troglodytes-2.1.3, was released on our Pre! site. All new annotation projects are now released with a genebuild summary document on the species homepage, providing the user with detailed methods.

Gene annotation on our most popular species—human, mouse and zebrafish—has been fully updated this year. These annotations are a merged gene set consisting of the output of the Ensembl evidence-based automatic pipeline (21) and the manual annotation from the Havana project (22). For human, the major update to the gene set was a part of release 62 (April 2011), and the Ensembl/Havana merged gene set continues to be equivalent to the GENCODE gene set, the reference gene set for the ENCODE project (23). The Ensembl RNAseq annotation pipeline, first developed for zebrafish, was used to generate gene models for 16 human tissues using the Illumina Human BodyMap 2.0 data, produced on HiSeq 2000 instruments in 2010. The gene models based on this annotation have been made available to the user community in a dedicated RNAseq database and can also be viewed on the Ensembl website alongside intron supporting features that indicate the number of intron-spanning reads that have mapped to the transcript model. The zebrafish and mouse gene sets were updated, respectively, in release 60 (November 2010) and release 61 (February 2011), with zebrafish moving to the new Zv9 assembly.

The methods used to merge the Ensembl and Havana annotation are updated regularly as the input data evolves, for example genes with biotype ‘lincRNA’ were introduced into Havana this year. We continue to be part of the Consensus Coding Sequence (CCDS) project (24) for human and mouse, and all current CCDS models are included in our gene sets.

Ensembl release 64 (September 2011) displays the full human GRCh37.p5 assembly produced in June 2011 by the Genome Reference Consortium (25). This patch includes 105 regions of which 40 are of the fix type and the rest are novel patches. By default, the primary assembly (GRCh37 chromosomes 1–22, X, Y) is displayed on the Ensembl website and these sequences are identical to those found on other genome browsers. Users can choose to display alternate loci (haplotypes, fix patches and novel patches) by selecting them in LocationView. We provide annotation on the GRC assembly patches and have developed a dedicated pipeline for this purpose where alignment-based annotation is combined with projected annotation from the primary assembly to provide the most complete annotation coverage of each patch. We are the only genome browser to integrate these alternate sequences into the primary assembly and to allow for visualization of the alternate sequences alongside the surrounding primary assembly (Figure 1).

A region of human chromosome 9 from the GRCh37.p5 assembly showing the fix patch applied in the ABO locus and displayed in genomic context (green region on the lower panel). In the upper panel, the full chromosome display shows the locations of a number of other fix and novel patches that are a part of the GRCh37.p5 assembly.

Figure 1.

A region of human chromosome 9 from the GRCh37.p5 assembly showing the fix patch applied in the ABO locus and displayed in genomic context (green region on the lower panel). In the upper panel, the full chromosome display shows the locations of a number of other fix and novel patches that are a part of the GRCh37.p5 assembly.

Comparative genomics

The gene set from each supported species is included in the Ensembl GeneTrees. As new species are added, we can better resolve the phylogenetic history of the genes. For instance, with the addition of the turkey gene set, we can now detect that 15% of the gene duplications that appeared to be specific to the chicken genome happened in the Phasianidae lineage (26).

To cater for the increase in the number of species, the orthologues view now includes a summary table that shows the distribution of one-to-one, one-to-many and many-to-many relationships among species. This table serves both as a summary and as a way to filter the results to, for example, all fishes.

In addition to the five-way fish Enredo-Pecan-Ortheus (EPO), the set of whole-genome multiple alignments provided by Ensembl now includes a three-way avian EPO alignment (27,28) incorporating chicken, turkey and zebra finch. We use each of these alignments and GERP (29) to estimate a per-base sequence conservation score and to annotate regions of evolutionary constraint.

Ancestral alleles for the human genome are inferred from the six-way primate EPO alignments. In the EPO pipeline, Ortheus uses Pecan alignments to reconstruct the per-base history of the sequences. We use the most recent ancestor to call the ancestral allele for the whole human genome. These data are used in the 1000 Genome Project (30) as well as in dbSNP to supplement their previous ancestral allele calls (31). The same approach was used to study the recent evolution in the primate lineage (32).

Regulation

Over the past year, we have continued to incorporate additional regulatory information into Ensembl from high-throughput sequence assays of chromatin samples. The data are processed by an integrated mapping and processing pipeline using the eHive system (33). As of Ensembl release 64, the Ensembl Regulation database contained 369 ChIP-seq and DNase-seq data sets from 10 human and 5 mouse cell lines. These included the genomic locations of binding regions for 74 different transcription factors (TFs) as well as the locations of sites for 40 modified histones, and an additional 26 data sets that identify regions of open chromatin or DNase I hypersensitivity. Twenty-five of the TFs have binding matrices available through the JASPAR database (34), and we provide the positions of high probability TF-binding sites within the binding regions based upon these matrices. From release 64, we have run separate peak calling methods tuned for either sharp, punctate signals (TFs or punctate histone modification ChIP-seq samples) or broad region signals (e.g. H3K36me3). These data are used as the basis for the Ensembl Regulatory Build that integrates peak calls with histone modifications on a cell-specific basis to provide a regulatory annotation of the genome. In human, the combined regulatory build across 10 cell lines annotates 228 Mb of genome in 442 258 regulatory features. The addition of data for TFs involved in RNA Polymerase 3 (Pol3) gene transcription has allowed broadening of the scope of annotation of regulatory features to include Pol3 gene associated features. Furthermore, we have recently added Distributed Annotation System (DAS) access to reduced representation bisulphite sequencing data generated by the ENCODE project annotating the extent of DNA methylation at hundreds of thousands of CpG dinucleotides across the genome in 44 human cell lines. The Ensembl Regulation database continues to provide mapping of probe sets for all the common microarray platforms including new arrays such as the Illumina Infinium methylation designs.

Over the year, we have made a number of changes to the database and interface to provide improved performance and utility. Browser response time of the ‘multi-wiggle’ track type displays that were first introduced in 2010 (16) has been considerably advanced by moving the signal data into compressed binary files stored outside of the main database, resulting in more than 50-fold improvement in data load times. Control of the content and display of the signal and peak tracks for ChIP-seq and DNase-seq data in the Location and Regulatory Feature views has been refined by an enhanced matrix style interface that centralizes all display options (Figure 2). From release 64 onwards, we also store explicit links to raw data in the European Nucleotide Archive (ENA) with API support to provide better documentation of experimental samples. Finally, the Ensembl Regulation BioMart has been restructured to improve usability as well as to include more data.

Matrix style configuration panels for Open chromatin & Transcription Factor Binding Sites (on right) and Histones and Polymerases (on left). Both panels are accessible from the relevant portions of the Regulation section of the ‘Configure this page’ link found on the left menu of all location view pages.

Figure 2.

Matrix style configuration panels for Open chromatin & Transcription Factor Binding Sites (on right) and Histones and Polymerases (on left). Both panels are accessible from the relevant portions of the Regulation section of the ‘Configure this page’ link found on the left menu of all location view pages.

Variation

Ensembl's variation data was updated with the major human data release from dbSNP 132. We also support three new species with the most recently released variation data from dbSNP: cat (dbSNP 131), opossum (dbSNP 131) and pig (dbSNP 128). In addition, the variation data for zebrafinch, Tetraodon, horse, rat, zebrafish, cow and mouse were updated in the last year. Structural variation data are imported with each Ensembl release from DGVa (7) and now include the full structural variation data from the 1000 Genomes Pilot Project (30,35) in addition to structural variants from mouse, pig and dog. To facilitate browsing, the website separates data for 1000 Genomes Project variants and variation data with phenotypes in different tracks. New structural variation BioMart data sets for mouse, dog and pig have also been added.

As the amount of variation data has increased, so has the requirement to provide annotation of those variants subject to data problems or other concerns. For all data imported into the variation database, we continually update our methods for conducting quality and sanity checks and those data failing one of these checks are flagged with the reasons for concern. As of release 64, we flag as suspect a variant with any of the following characteristics: no genotypes with no alleles; four or more different alleles; no allele that matches the reference allele; alleles with ambiguity codes; mapped position incompatible with the reported alleles; no associated sequence; genotype frequencies which do not add up to 1; data that either do not map to the genome or have no genotypes; non-nucleotide alleles. Each variant is now checked by ssID so that a single problem submission can be effectively filtered from the database.

We continue to provide extensive data resources for disease and phenotype annotations for both germline variants and somatic mutations. As of release 64 (September 2011), a total of 174 681 distinct variants in Ensembl have phenotype annotation. These data include 43 272 somatic mutations from COSMIC (36) and 61 793 mutations from the public portion of the HGMD database (37). We also have more than 5000 phenotype–variant associations from the NHGRI GWAS catalog (38) and over 14 000 from OMIM.

We have made significant improvements to the variation consequence pipeline, which now supports allele-specific consequences, complete SIFT (39) and PolyPHEN (40) predictions for the human proteome and new regulatory region consequences. All variation consequences are reported with standard sequence ontology terms (41) to ensure consistent definitions across browsers and facilitate comparison with other resources.

The Variant Effect Predictor (42) has also been extended to include these features and as an option, to run as a stand-alone software program that does not require a network connection or Ensembl database. The integration of regulatory features with DNA variation data is particularly powerful. Overlap of variants with regulatory features is reported, as well as overlap with TF-binding sites and high information content bases within the binding site. This provides an additional level of annotation of variants identified in GWAS and other studies.

Ensembl website and software infrastructure

This year saw the release of a third mirror of the Ensembl website in the Asia-Pacific region located at http://asia.ensembl.org. As with our other mirrors at http://useast.ensembl.org and http://uswest.ensembl.org, the Asia mirror uses Amazon Web Services (AWS) to provide the infrastructure (the USWest mirror was migrated to AWS Northern California data centre in 2011). By consolidating all of the supported Ensembl mirrors in AWS, we are able to provide consistent support and increased performance for users around the world. All users visiting the Ensembl website are automatically redirected to their nearest mirror, ensuring the best possible performance. For users accessing Ensembl data via our API or direct MySQL queries, we have also launched a second database server at useastdb.ensembl.org.

The last major update to the Ensembl web interface was release 51 (November 2008) (43,44). Over the past year, we have continued to focus on small but significant improvements to the web interface and on-going updates to the underlying software infrastructure. The latter included the deployment of Lucene, an open-source search engine that we also mirrored in AWS.

Display of the user's own data has been extended to include attachment of large indexed formats such as BAM, BigWig and VCF. For file types such as BED or GFF, the file upload process has been improved to give a count of features parsed from the file and to provide a link to sample coordinates where data can be viewed. Improvements to the discoverability of content have also been made, with a redesigned masthead containing links to popular content and the rewriting of some text-heavy pages to provide clear links enhanced by graphics. When users log in to their personal Ensembl accounts, each tab in the masthead also has dropdown menus giving quick access to other species and recently visited locations, genes, variations and regulatory features.

Finally, design of the pop-up configuration panel has been improved with the aid of user input. Favorite tracks can be saved, and a new graphical matrix allows quick and intuitive selection of cell/tissue data for regulatory tracks.

Ensembl user support

Ensembl continues to support users through our outreach activities, in addition to targeting new users and getting first-hand feedback about our resources from scientists. Our trainers have held workshops in 46 countries, and the 90 workshops given in 2011 included both Japan and Eastern Europe.

Documentation for the Ensembl API was updated this year to properly describe our inheritance model. This required the deployment of a new documentation system based on Doxygen (http://www.doxygen.org). The full API documentation is available at http://www.ensembl.org/info/docs/api/index.html.

This year also included new efforts to reach users though the launch of our Facebook site at www.facebook.com/Ensembl.org, where we post popular weekly navigation tips and other information about the project. Our Facebook page joins our blog, Twitter feed and YouTube channel, which features 12 training videos focused on specific uses of the Ensembl website. Finally, we launched a beginner Ensembl browser course as part of the EBI e-learning platform at http://www.ensembl.info/ecourse. The course allows new users to learn the basic navigation of the browser without having to attend a workshop.

FUNDING

The Wellcome Trust provides majority funding for the Ensembl project (WT062023 and WT079643) with additional funding from the National Human Genome Research Institute (U01HG004695, U54HG004563 and U41HG006104) and the European Molecular Biological Laboratory. Additional support as specified: Funded by the European Commission under SLING, grant agreement number 226073 (Integrating Activity) within Research Infrastructures of the FP7 Capacities Specific Programme; The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 222664 (‘Quantomics’). This Publication reflects only the author's views and the European Community is not liable for any use that may be made of the information contained herein; The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754—the GEN2PHEN project; The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under the grant agreement n° 223210 CISSTEM. Funding for open access charge: The Wellcome Trust.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank the members of our Scientific Advisory Board and all of our users. We are especially thankful to those who take the time to contact us through our mailing lists and blog. We acknowledge those researchers, organizations and large-scale projects that have provided data to Ensembl prior to publication under the understandings of the Fort Lauderdale meeting discussing Community Resource Projects and the Toronto meeting on prepublication data sharing (45).

REFERENCES

1

EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates

,

Genome Res.

,

2009

, vol.

19

(pg.

327

-

335

)

2

et al.

InterPro: the integrative protein signature database

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D211

-

D215

)

3

UniProt Consortium

The Universal Protein Resource (UniProt) in 2010

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D142

-

D148

)

4

et al.

The Pfam protein families database

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D211

-

D222

)

5

et al.

Ensembl Variation Resources

,

BMC Genomics

,

2010

, vol.

11

pg.

293

6

NCBI dbSNP Database: content and searching

,

Genetic Variation: A Laboratory Manual

,

2007

Cold Spring Harbor, NY, USA

Cold Spring Harbor Laboratory Press

(pg.

41

-

61

)

7

et al.

Public data archives for genomic structural variation

,

Nat. Genet.

,

2010

, vol.

42

(pg.

813

-

814

)

8

The ENCODE Project Consortium

A User's guide to the encyclopedia of DNA elements (ENCODE)

,

PLoS Biol.

,

2011

, vol.

9

pg.

e1001046

9

et al.

The NIH Roadmap Epigenomics Mapping Consortium

,

Nat. Biotechnol.

,

2010

, vol.

28

(pg.

1045

-

1048

)

10

The Ensembl core software libraries

,

Genome Res.

,

2004

, vol.

14

(pg.

929

-

933

)

11

et al.

Ensembl BioMarts: a hub for data retrieval across taxonomic space

,

Database (Oxford)

,

2011

doi: 10.1093/database/bar030

12

BioMart–biological queries made easy

,

BMC Genomics

,

2009

, vol.

10

pg.

22

13

et al.

The UCSC Genome Browser database: update 2011

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D876

-

D882

)

14

et al.

Database resources of the National Center for Biotechnology Information

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D38

-

D51

)

15

et al.

Ensembl's 10th year

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D557

-

D562

)

16

et al.

Ensembl 2011

,

Nucleic Acids Res.

,

2011

, vol.

39

(pg.

D800

-

D806

)

17

et al.

The genome sequence of Atlantic cod reveals a unique immune system

,

Nature

,

2011

, vol.

477

(pg.

207

-

210

)

18

et al.

Comparative and demographic analysis of orang-utan genomes

,

Nature

,

2011

, vol.

469

(pg.

529

-

533

)

19

et al.

The genome of the green anole lizard and a comparative analysis with birds and mammals

,

Nature

,

2011

, vol.

477

(pg.

587

-

591

)

20

et al.

Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development

,

Genome Biol.

,

2011

, vol.

12

pg.

R81

21

The Ensembl automatic gene annotation system

,

Genome Res.

,

2004

, vol.

14

(pg.

942

-

950

)

22

The vertebrate genome annotation (Vega) database

,

Nucleic Acids Res.

,

2008

, vol.

36

(pg.

D753

-

D760

)

23

et al.

GENCODE: producing a reference annotation for ENCODE

,

Genome Biol.

,

2006

, vol.

7

Suppl. 1

(pg.

S4.1

-

S4.9

)

24

et al.

The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes

,

Genome Res.

,

2009

, vol.

19

(pg.

1316

-

1323

)

25

et al.

Modernizing reference genome assemblies

,

PLoS Biol.

,

2011

, vol.

9

pg.

e1001091

26

et al.

Multi-platform next-generation sequencing of the domestic Turkey (Meleagris gallopavo): genome assembly and analysis

,

PLoS Biol.

,

2010

, vol.

8

pg.

e1000475

27

Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs

,

Genome Res.

,

2008

, vol.

18

(pg.

1814

-

1828

)

28

Genome-wide nucleotide-level mammalian ancestor reconstruction

,

Genome Res.

,

2008

, vol.

18

(pg.

1829

-

1843

)

29

NISC Comparative Sequencing Program

,

Distribution and intensity of constraint in mammalian genomic sequence

,

Genome Res.

,

2005

, vol.

15

(pg.

901

-

913

)

30

1000 Genomes Project Consortium

A map of human genome variation from population-scale sequencing

,

Nature

,

2010

, vol.

467

(pg.

1061

-

1073

)

31

The influence of recombination on human genetic diversity

,

PLoS Genet.

,

2006

, vol.

2

pg.

e148

32

Ongoing GC-biased evolution is widespread in the human genome and enriched near recombination hot spots

,

Genome Biol Evol.

,

2011

, vol.

3

(pg.

614

-

626

)

33

eHive: An Artificial Intelligence workflow system for genomic analysis

,

BMC Bioinformatics

,

2010

, vol.

11

pg.

240

34

et al.

JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D105

-

D110

)

35

et al.

Mapping copy number variation by population-scale genome sequencing

,

Nature

,

2011

, vol.

470

(pg.

59

-

65

)

36

et al.

COSMIC (the catalogue of Somatic mutations in cancer): a resource to investigate acquired mutations in human cancer

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D652

-

D657

)

37

The Human Gene Mutation Database (HGMD) and its exploitation in the study of mutational mechanisms

,

Curr. Protoc. Bioinformatics

,

2006

Chapter 1, Unit 1.13

38

Potential etiologic and functional implications of genome-wide association loci for human diseases and traits

,

Proc. Natl Acad. Sci. USA

,

2009

, vol.

106

(pg.

9362

-

9367

)

39

Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm

,

Nat. Protoc.

,

2009

, vol.

4

(pg.

1073

-

1081

)

40

A method and server for predicting damaging missense mutations

,

Nat. Methods

,

2010

, vol.

7

(pg.

248

-

249

)

41

The Sequence Ontology: a tool for the unification of genome annotations

,

Genome Biol.

,

2005

, vol.

6

pg.

R44

42

Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor

,

Bioinformatics

,

2010

, vol.

26

(pg.

2069

-

2070

)

43

et al.

Ensembl 2009

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D690

-

D697

)

44

Using caching and optimization techniques to improve performance of the Ensembl website

,

BMC Bioinformatics

,

2010

, vol.

11

pg.

239

45

Toronto International Data Release Workshop Authors

Prepublication data sharing

,

Nature

,

2009

, vol.

461

(pg.

168

-

170

)

© The Author(s) 2011. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

I agree to the terms and conditions. You must accept the terms and conditions.

Submit a comment

Name

Affiliations

Comment title

Comment

You have entered an invalid code

Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.

Citations

Views

Altmetric

Metrics

Total Views 7,406

5,621 Pageviews

1,785 PDF Downloads

Since 1/1/2017

Month:	Total Views:
January 2017	12
February 2017	27
March 2017	38
April 2017	25
May 2017	37
June 2017	24
July 2017	21
August 2017	17
September 2017	25
October 2017	16
November 2017	28
December 2017	71
January 2018	85
February 2018	56
March 2018	75
April 2018	90
May 2018	74
June 2018	59
July 2018	48
August 2018	63
September 2018	43
October 2018	45
November 2018	51
December 2018	42
January 2019	43
February 2019	54
March 2019	88
April 2019	91
May 2019	58
June 2019	83
July 2019	92
August 2019	65
September 2019	67
October 2019	61
November 2019	52
December 2019	41
January 2020	33
February 2020	46
March 2020	41
April 2020	36
May 2020	29
June 2020	66
July 2020	49
August 2020	71
September 2020	70
October 2020	71
November 2020	122
December 2020	51
January 2021	75
February 2021	75
March 2021	115
April 2021	81
May 2021	69
June 2021	57
July 2021	55
August 2021	71
September 2021	73
October 2021	88
November 2021	96
December 2021	95
January 2022	64
February 2022	99
March 2022	95
April 2022	116
May 2022	93
June 2022	93
July 2022	189
August 2022	98
September 2022	123
October 2022	169
November 2022	89
December 2022	149
January 2023	114
February 2023	85
March 2023	141
April 2023	156
May 2023	129
June 2023	73
July 2023	70
August 2023	87
September 2023	79
October 2023	106
November 2023	107
December 2023	119
January 2024	167
February 2024	108
March 2024	151
April 2024	138
May 2024	157
June 2024	103
July 2024	97
August 2024	97
September 2024	80
October 2024	135
November 2024	28

Citations

753 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic