Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor (original) (raw)

Journal Article

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

* To whom correspondence should be addressed.

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

,

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Search for other works by this author on:

1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

* To whom correspondence should be addressed.

Search for other works by this author on:

Revision received:

27 May 2010

Cite

William McLaren, Bethan Pritchard, Daniel Rios, Yuan Chen, Paul Flicek, Fiona Cunningham, Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor, Bioinformatics, Volume 26, Issue 16, August 2010, Pages 2069–2070, https://doi.org/10.1093/bioinformatics/btq330
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Summary: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species.

Availability: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at http://www.ensembl.org/. The Ensembl API (http://www.ensembl.org/info/docs/api/api_installation.html for installation instructions) is open source software.

Contact: wm2@ebi.ac.uk; fiona@ebi.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

As costs of resequencing and genotyping fall, increasing amounts of variation data are being produced that cannot be annotated effectively without access to considerable computational resources and genomic annotation databases. Often the most valuable information to know about a variant is the effect the observed alleles have on transcripts, which may aid selection of variations for genotyping studies and in turn have a part to play in the discovery of new drug targets and other biologically significant loci. Deriving this information manually is laborious and error-prone, impractical for large sets of data and impossible without access to suitable genomic annotation resources. Of critical interest is the existence of any novel variant positions within a dataset, and what information is available for known variant loci. Although these answers are available through dbSNP (Sherry et al., 1999), the process of submitting the data to the NCBI to be processed and annotated can often take months and requires the data to be made public.

Even before the developments reported in this article, Ensembl (Flicek et al., 2010) could also be used to derive similar annotation by setting up a full local Ensembl database containing the variant information and running scripts from the Ensembl Variation database production pipeline. Other tools available for the annotation of single nucleotide polymorphisms (SNPs) in humans are comprehensively reviewed in Karchin (2008).

Existing methods of deriving the effects of variants can be limiting: many present too high a hurdle in terms of timeframe, privacy or ease of use; others are species limited. To address this, Ensembl has been extended to include an easy to use web-based tool for deriving variation consequences, as well as programmatic access to the same functionality using the Ensembl Perl API.

2 IMPLEMENTATION

2.1 SNP Effect Predictor

The Ensembl project provides access to genomic annotation for numerous species via its web-based genome browser, as well as programmatic access via the object-oriented Perl API. The SNP Effect Predictor tool on the Ensembl website, accessed via the ‘Manage your data’ link on any species-specific Ensembl page (e.g. http://www.ensembl.org/Homo_sapiens/), uses the API calls described below to provide access to consequence prediction functionality without the need for writing code. The SNP Effect Predictor can be used for all species within Ensembl, including those with no existing variation dataset.

Users upload lists of variant positions and alleles via a HTML form page. Input for each variant consists simply of a chromosome (or contig name in the absence of assembled chromosomes), start and end coordinates, strand designation and a set of alleles. Users can then select text or HTML formatted output, the latter incorporating hyperlinks to loci, transcripts and genes in the Ensembl genome browser. The output includes: Ensembl stable identifiers for the relevant transcript and gene; transcript-relative coordinates; possible amino acids; and the identifier of any existing variants that are co-located with the user-defined variant. Since a variant may co-locate with more than one transcript, one line of output is provided for each instance of co-location. Consequence types predicted by Ensembl are shown in transcript context in Figure 1, with further detail provided at (http://www.ensembl.org/info/docs/variation/index.html).

Consequence types predicted by Ensembl in the context of transcript structure. The other types shown apply to non-protein coding genes.

Fig. 1.

Consequence types predicted by Ensembl in the context of transcript structure. The other types shown apply to non-protein coding genes.

User uploaded variations can subsequently be viewed in the context of their location on the Ensembl browser, with each uploaded file given its own track on the browser's location view.

2.2 Ensembl API

The Ensembl API can be installed on any operating system that supports Perl and MySQL, and can be configured to use any combination of local or remote databases. The Ensembl Variation API (Chen et al., 2010; Rios et al., 2010) exists to retrieve variation data such as SNPs, insertions and deletions from Ensembl databases. Entities such as variants are represented as objects, created by adaptors that act as factories for generating specific objects. Example code demonstrating the use of the API to derive consequences for a list of variant positions is shown in Supplementary Figure 1. Documentation on the API is found at http://www.ensembl.org/info/docs/Pdoc/ensembl-variation/index.html.

Given a variant position, the API retrieves overlapping transcripts from the Ensembl Core database and determines where in the transcript structure the variant falls. If the variant falls within an exon, new codons for each variant allele are derived and compared to the reference codon. The location of the variant relative to regulatory regions is also assessed using the Ensembl Functional Genomics database where available. The results, including amino acid changes and relative positions in the cDNA and peptide sequences, are stored in the resulting transcript variation objects, along with one or more named consequence types.

At present Ensembl provides only a Perl API, but enabled by the open source nature of the project Python (PyCogent, http://pycogent.sourceforge.net/examples/query_ensembl.html; PyGr, http://code.google.com/p/pygr/wiki/PygrOnEnsembl) APIs have been created. As yet, these do not encompass the full scope of Ensembl, and hence do not include consequence prediction functionality.

3 RESULTS

The SNP Effect Predictor tool can be used to quickly and accurately predict the effects of variants on Ensembl-annotated transcripts. Up to 750 variant loci can be uploaded in a file to http://www.ensembl.org/, with the time taken to return results scaling linearly with the number of variants uploaded within a species; calculation time will also vary by species depending on the number of transcripts. A file containing 750 variants in Homo sapiens takes ∼35 s to return results; an equivalent calculation in Danio rerio takes 20 s. Users with more than 750 variants may download a standalone script to run locally that produces identical results. The script can be configured to connect to both the public Ensembl database as well as any combination of local and remote databases. A wider range of input file formats is also supported, including the commonly used pileup variant format.

The provision of a simple web interface to powerful algorithms that transparently process large data volumes is a valuable asset to users without computing expertise, and also to those who need a quick and easy way to retrieve annotation for novel variants. Having this tool integrated with the extensive, rich annotation available on the Ensembl website will facilitate interpretation and analysis of the data.

Direct use of the Ensembl Variation API enables users to incorporate consequence prediction into their variation software and pipelines, providing predicted consequences for an unlimited number of variants. By optimizing code and database access times it is possible to retrieve consequences for 1000 distinct variants in H.sapiens in <30 s; for D.rerio this takes <15 s.

The flexibility of the Ensembl API means that consequences can be predicted for any species with an Ensembl gene set, or using any valid Ensembl database on users' own systems. Using these features in coalition with others in the API enables the creation of advanced pipelines that can produce biologically important information from high-throughput experimental data. Such information is invaluable both as a screening system for variants and as an aid in the study of phenotypically linked variants.

Funding: Wellcome Trust; European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 – the GEN2PHEN project.

Conflict of Interest: none declared.

REFERENCES

et al.

Ensembl variation resources

,

BMC Genomics

,

2010

, vol.

11

pg.

238

et al.

Ensembl's 10th year

,

Nucleic Acids Res.

,

2010

, vol.

38

(pg.

D557

-

D562

)

Next generation tools for the annotation of human SNPs

,

Brief Bioinformatics

,

2008

, vol.

10

(pg.

35

-

52

)

et al.

A database and API for variation, dense genotyping and resequencing data

,

BMC Bioinformatics

,

2010

, vol.

11

pg.

293

et al.

dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation

,

Genome Res.

,

1999

, vol.

9

(pg.

677

-

679

)

Author notes

Associate Editor: Alfonso Valencia

© The Author(s) 2010. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 9,806

7,715 Pageviews

2,091 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 4
December 2016 9
January 2017 23
February 2017 58
March 2017 54
April 2017 43
May 2017 39
June 2017 38
July 2017 39
August 2017 44
September 2017 50
October 2017 38
November 2017 45
December 2017 100
January 2018 120
February 2018 90
March 2018 134
April 2018 126
May 2018 121
June 2018 107
July 2018 119
August 2018 111
September 2018 58
October 2018 79
November 2018 90
December 2018 94
January 2019 83
February 2019 74
March 2019 112
April 2019 136
May 2019 133
June 2019 86
July 2019 135
August 2019 125
September 2019 114
October 2019 85
November 2019 126
December 2019 126
January 2020 104
February 2020 97
March 2020 82
April 2020 38
May 2020 56
June 2020 78
July 2020 84
August 2020 82
September 2020 116
October 2020 140
November 2020 99
December 2020 89
January 2021 103
February 2021 104
March 2021 132
April 2021 111
May 2021 115
June 2021 91
July 2021 89
August 2021 120
September 2021 133
October 2021 125
November 2021 133
December 2021 105
January 2022 81
February 2022 117
March 2022 123
April 2022 120
May 2022 138
June 2022 121
July 2022 101
August 2022 119
September 2022 86
October 2022 137
November 2022 125
December 2022 126
January 2023 125
February 2023 106
March 2023 109
April 2023 78
May 2023 119
June 2023 94
July 2023 101
August 2023 91
September 2023 88
October 2023 88
November 2023 80
December 2023 136
January 2024 172
February 2024 216
March 2024 487
April 2024 141
May 2024 130
June 2024 84
July 2024 79
August 2024 93
September 2024 86
October 2024 92
November 2024 23

×

Email alerts

Citing articles via

More from Oxford Academic