Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor (original) (raw)
Journal Article
,
1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
* To whom correspondence should be addressed.
Search for other works by this author on:
,
1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
Search for other works by this author on:
,
1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
Search for other works by this author on:
,
1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
Search for other works by this author on:
,
1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
Search for other works by this author on:
1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and 2Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
* To whom correspondence should be addressed.
Search for other works by this author on:
Revision received:
27 May 2010
Cite
William McLaren, Bethan Pritchard, Daniel Rios, Yuan Chen, Paul Flicek, Fiona Cunningham, Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor, Bioinformatics, Volume 26, Issue 16, August 2010, Pages 2069–2070, https://doi.org/10.1093/bioinformatics/btq330
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
Summary: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species.
Availability: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at http://www.ensembl.org/. The Ensembl API (http://www.ensembl.org/info/docs/api/api_installation.html for installation instructions) is open source software.
Contact: wm2@ebi.ac.uk; fiona@ebi.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
1 INTRODUCTION
As costs of resequencing and genotyping fall, increasing amounts of variation data are being produced that cannot be annotated effectively without access to considerable computational resources and genomic annotation databases. Often the most valuable information to know about a variant is the effect the observed alleles have on transcripts, which may aid selection of variations for genotyping studies and in turn have a part to play in the discovery of new drug targets and other biologically significant loci. Deriving this information manually is laborious and error-prone, impractical for large sets of data and impossible without access to suitable genomic annotation resources. Of critical interest is the existence of any novel variant positions within a dataset, and what information is available for known variant loci. Although these answers are available through dbSNP (Sherry et al., 1999), the process of submitting the data to the NCBI to be processed and annotated can often take months and requires the data to be made public.
Even before the developments reported in this article, Ensembl (Flicek et al., 2010) could also be used to derive similar annotation by setting up a full local Ensembl database containing the variant information and running scripts from the Ensembl Variation database production pipeline. Other tools available for the annotation of single nucleotide polymorphisms (SNPs) in humans are comprehensively reviewed in Karchin (2008).
Existing methods of deriving the effects of variants can be limiting: many present too high a hurdle in terms of timeframe, privacy or ease of use; others are species limited. To address this, Ensembl has been extended to include an easy to use web-based tool for deriving variation consequences, as well as programmatic access to the same functionality using the Ensembl Perl API.
2 IMPLEMENTATION
2.1 SNP Effect Predictor
The Ensembl project provides access to genomic annotation for numerous species via its web-based genome browser, as well as programmatic access via the object-oriented Perl API. The SNP Effect Predictor tool on the Ensembl website, accessed via the ‘Manage your data’ link on any species-specific Ensembl page (e.g. http://www.ensembl.org/Homo_sapiens/), uses the API calls described below to provide access to consequence prediction functionality without the need for writing code. The SNP Effect Predictor can be used for all species within Ensembl, including those with no existing variation dataset.
Users upload lists of variant positions and alleles via a HTML form page. Input for each variant consists simply of a chromosome (or contig name in the absence of assembled chromosomes), start and end coordinates, strand designation and a set of alleles. Users can then select text or HTML formatted output, the latter incorporating hyperlinks to loci, transcripts and genes in the Ensembl genome browser. The output includes: Ensembl stable identifiers for the relevant transcript and gene; transcript-relative coordinates; possible amino acids; and the identifier of any existing variants that are co-located with the user-defined variant. Since a variant may co-locate with more than one transcript, one line of output is provided for each instance of co-location. Consequence types predicted by Ensembl are shown in transcript context in Figure 1, with further detail provided at (http://www.ensembl.org/info/docs/variation/index.html).
Fig. 1.
Consequence types predicted by Ensembl in the context of transcript structure. The other types shown apply to non-protein coding genes.
User uploaded variations can subsequently be viewed in the context of their location on the Ensembl browser, with each uploaded file given its own track on the browser's location view.
2.2 Ensembl API
The Ensembl API can be installed on any operating system that supports Perl and MySQL, and can be configured to use any combination of local or remote databases. The Ensembl Variation API (Chen et al., 2010; Rios et al., 2010) exists to retrieve variation data such as SNPs, insertions and deletions from Ensembl databases. Entities such as variants are represented as objects, created by adaptors that act as factories for generating specific objects. Example code demonstrating the use of the API to derive consequences for a list of variant positions is shown in Supplementary Figure 1. Documentation on the API is found at http://www.ensembl.org/info/docs/Pdoc/ensembl-variation/index.html.
Given a variant position, the API retrieves overlapping transcripts from the Ensembl Core database and determines where in the transcript structure the variant falls. If the variant falls within an exon, new codons for each variant allele are derived and compared to the reference codon. The location of the variant relative to regulatory regions is also assessed using the Ensembl Functional Genomics database where available. The results, including amino acid changes and relative positions in the cDNA and peptide sequences, are stored in the resulting transcript variation objects, along with one or more named consequence types.
At present Ensembl provides only a Perl API, but enabled by the open source nature of the project Python (PyCogent, http://pycogent.sourceforge.net/examples/query_ensembl.html; PyGr, http://code.google.com/p/pygr/wiki/PygrOnEnsembl) APIs have been created. As yet, these do not encompass the full scope of Ensembl, and hence do not include consequence prediction functionality.
3 RESULTS
The SNP Effect Predictor tool can be used to quickly and accurately predict the effects of variants on Ensembl-annotated transcripts. Up to 750 variant loci can be uploaded in a file to http://www.ensembl.org/, with the time taken to return results scaling linearly with the number of variants uploaded within a species; calculation time will also vary by species depending on the number of transcripts. A file containing 750 variants in Homo sapiens takes ∼35 s to return results; an equivalent calculation in Danio rerio takes 20 s. Users with more than 750 variants may download a standalone script to run locally that produces identical results. The script can be configured to connect to both the public Ensembl database as well as any combination of local and remote databases. A wider range of input file formats is also supported, including the commonly used pileup variant format.
The provision of a simple web interface to powerful algorithms that transparently process large data volumes is a valuable asset to users without computing expertise, and also to those who need a quick and easy way to retrieve annotation for novel variants. Having this tool integrated with the extensive, rich annotation available on the Ensembl website will facilitate interpretation and analysis of the data.
Direct use of the Ensembl Variation API enables users to incorporate consequence prediction into their variation software and pipelines, providing predicted consequences for an unlimited number of variants. By optimizing code and database access times it is possible to retrieve consequences for 1000 distinct variants in H.sapiens in <30 s; for D.rerio this takes <15 s.
The flexibility of the Ensembl API means that consequences can be predicted for any species with an Ensembl gene set, or using any valid Ensembl database on users' own systems. Using these features in coalition with others in the API enables the creation of advanced pipelines that can produce biologically important information from high-throughput experimental data. Such information is invaluable both as a screening system for variants and as an aid in the study of phenotypically linked variants.
Funding: Wellcome Trust; European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 – the GEN2PHEN project.
Conflict of Interest: none declared.
REFERENCES
et al.
Ensembl variation resources
,
BMC Genomics
,
2010
, vol.
11
pg.
238
et al.
Ensembl's 10th year
,
Nucleic Acids Res.
,
2010
, vol.
38
(pg.
D557
-
D562
)
Next generation tools for the annotation of human SNPs
,
Brief Bioinformatics
,
2008
, vol.
10
(pg.
35
-
52
)
et al.
A database and API for variation, dense genotyping and resequencing data
,
BMC Bioinformatics
,
2010
, vol.
11
pg.
293
et al.
dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation
,
Genome Res.
,
1999
, vol.
9
(pg.
677
-
679
)
Author notes
Associate Editor: Alfonso Valencia
© The Author(s) 2010. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Supplementary data
Citations
Views
Altmetric
Metrics
Total Views 9,806
7,715 Pageviews
2,091 PDF Downloads
Since 11/1/2016
Month: | Total Views: |
---|---|
November 2016 | 4 |
December 2016 | 9 |
January 2017 | 23 |
February 2017 | 58 |
March 2017 | 54 |
April 2017 | 43 |
May 2017 | 39 |
June 2017 | 38 |
July 2017 | 39 |
August 2017 | 44 |
September 2017 | 50 |
October 2017 | 38 |
November 2017 | 45 |
December 2017 | 100 |
January 2018 | 120 |
February 2018 | 90 |
March 2018 | 134 |
April 2018 | 126 |
May 2018 | 121 |
June 2018 | 107 |
July 2018 | 119 |
August 2018 | 111 |
September 2018 | 58 |
October 2018 | 79 |
November 2018 | 90 |
December 2018 | 94 |
January 2019 | 83 |
February 2019 | 74 |
March 2019 | 112 |
April 2019 | 136 |
May 2019 | 133 |
June 2019 | 86 |
July 2019 | 135 |
August 2019 | 125 |
September 2019 | 114 |
October 2019 | 85 |
November 2019 | 126 |
December 2019 | 126 |
January 2020 | 104 |
February 2020 | 97 |
March 2020 | 82 |
April 2020 | 38 |
May 2020 | 56 |
June 2020 | 78 |
July 2020 | 84 |
August 2020 | 82 |
September 2020 | 116 |
October 2020 | 140 |
November 2020 | 99 |
December 2020 | 89 |
January 2021 | 103 |
February 2021 | 104 |
March 2021 | 132 |
April 2021 | 111 |
May 2021 | 115 |
June 2021 | 91 |
July 2021 | 89 |
August 2021 | 120 |
September 2021 | 133 |
October 2021 | 125 |
November 2021 | 133 |
December 2021 | 105 |
January 2022 | 81 |
February 2022 | 117 |
March 2022 | 123 |
April 2022 | 120 |
May 2022 | 138 |
June 2022 | 121 |
July 2022 | 101 |
August 2022 | 119 |
September 2022 | 86 |
October 2022 | 137 |
November 2022 | 125 |
December 2022 | 126 |
January 2023 | 125 |
February 2023 | 106 |
March 2023 | 109 |
April 2023 | 78 |
May 2023 | 119 |
June 2023 | 94 |
July 2023 | 101 |
August 2023 | 91 |
September 2023 | 88 |
October 2023 | 88 |
November 2023 | 80 |
December 2023 | 136 |
January 2024 | 172 |
February 2024 | 216 |
March 2024 | 487 |
April 2024 | 141 |
May 2024 | 130 |
June 2024 | 84 |
July 2024 | 79 |
August 2024 | 93 |
September 2024 | 86 |
October 2024 | 92 |
November 2024 | 23 |
×
Email alerts
Citing articles via
More from Oxford Academic