A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 - PubMed (original) (raw)
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3
Pablo Cingolani et al. Fly (Austin). 2012 Apr-Jun.
Abstract
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
Figures
Figure 1. Classification of SNPs in _w_1118; _iso-_2; _iso_-3. The number of NSPs in each class is shown above the bar. The quality score was arbitrarily set at 70 and above for this graph.
Figure 2. Analysis of Eip63E start-gained SNP in _w_1118; _iso-_2; _iso_-3. (A) Location of the start-gained SNP at the Eip63E locus. Notice that the reading frame is the same as the normal translation start site (TSS). (B) Conservation of 60 amino acid N-terminal region of Eip63E in _w_1118; _iso-_2; _iso_-3 with Drosophila yakuba orthologous gene. The other sequenced Drosophila species do not have this N-terminal sequence (not shown).
Figure 3. Oc/Otd has two stop-gained SNPs in _w_1118; _iso-_2; _iso_-3. (A) Location of the two stop gained SNPs in oc/otd. (B) Protein BLAST of Oc/Otd against the non-redundant (nr) protein database shows that only the 60 amino Hox domain flanking amino acid 100 is conserved from Drosophila to humans. The color coding shows the alignment scores.
Figure 4. CG34326 has one stop-gained SNP in _w_1118; _iso-_2; _iso_-3 in the non-conserved C-terminal region. (A) Protein BLAST of CG34326 against the non-redundant (nr) protein database shows that only the 38 N-terminal amino acids are conserved among Drosophila species and not beyond Drosophila. The colored lines represent the homologs from the following organisms: Drosophila melanogaster, Drosophila grimshawi, Drosophila yakuba, Drosophila erecta, Drosophila virilus, Ixodes scapularis, Ixodes scapularis, and Nycticebus coucang. (B) Aligment of Drosophila melanogaster CG34326 with orthologous gene from Drosophila grimshawi. (C) Aligment of Drosophila melanogaster CG34326 with orthologous gene from Drosophila yakuba.
Figure 5. CG13958 has a stop lost SNP in _w_1118; _iso-_2; _iso_-3. The top comparison shows the alignment of the Drosophila melanogaster reference genome with _w_1118; _iso-_2; _iso_-3. Notice that the stop lost causes an extension of nine amino acids. The second through sixth comparisons shows the alignment of Drosophila simulans, Drosophila erecta, Drosophila yakuba, Drosophila mojavensis, and Drosophila pseudoobscura pseudoobscura (Sbjct) with the Drosophila melanogaster reference genome (Dm-ref). The number of terminal amino acids missing or gained is shown (-1 to +3).
Figure 6. Nonsynonymous to synonymous ratios along the chromosome arms in _w_1118; _iso-_2; _iso_-3. (A) Left, Nonsynonymous SNPs at 1 Mbp intervals along the 2L chromosome arm (black) and synonymous SNPs (gray). Right, N/S ratios (NS/Syn) along the chromosome arms. Notice that N/S ratios are higher near the centromere and telomere (see text). (B-F) as in (A), but for chromosome arms 2R, 3L, 3R, 4 and X.
Similar articles
- Massively parallel resequencing of the isogenic Drosophila melanogaster strain w(1118); iso-2; iso-3 identifies hotspots for mutations in sensory perception genes.
Platts AE, Land SJ, Chen L, Page GP, Rasouli P, Wang L, Lu X, Ruden DM. Platts AE, et al. Fly (Austin). 2009 Jul-Sep;3(3):192-203. doi: 10.4161/fly.3.3.9652. Epub 2009 Jul 23. Fly (Austin). 2009. PMID: 19690466 Free PMC article. - Genome features of "Dark-fly", a Drosophila line reared long-term in a dark environment.
Izutsu M, Zhou J, Sugiyama Y, Nishimura O, Aizu T, Toyoda A, Fujiyama A, Agata K, Fuse N. Izutsu M, et al. PLoS One. 2012;7(3):e33288. doi: 10.1371/journal.pone.0033288. Epub 2012 Mar 14. PLoS One. 2012. PMID: 22432011 Free PMC article. - Drosophila melanogaster: a case study of a model genomic sequence and its consequences.
Ashburner M, Bergman CM. Ashburner M, et al. Genome Res. 2005 Dec;15(12):1661-7. doi: 10.1101/gr.3726705. Genome Res. 2005. PMID: 16339363 Review. - Evolutionary insights from large scale resequencing datasets in Drosophila melanogaster.
Guirao-Rico S, González J. Guirao-Rico S, et al. Curr Opin Insect Sci. 2019 Feb;31:70-76. doi: 10.1016/j.cois.2018.11.002. Epub 2018 Nov 20. Curr Opin Insect Sci. 2019. PMID: 31109676 Review.
Cited by
- Insights into genomic sequence diversity of the SAG surface antigen superfamily in geographically diverse Eimeria tenella isolates.
Kiang AL, Loo SS, Mat-Isa MN, Ng CL, Blake DP, Wan KL. Kiang AL, et al. Sci Rep. 2024 Nov 1;14(1):26251. doi: 10.1038/s41598-024-77580-7. Sci Rep. 2024. PMID: 39482455 Free PMC article. - A Survey of Compound Heterozygous Variants in Pediatric Cancers and Structural Birth Defects.
Miller DB, Piccolo SR. Miller DB, et al. Front Genet. 2021 Mar 22;12:640242. doi: 10.3389/fgene.2021.640242. eCollection 2021. Front Genet. 2021. PMID: 33828584 Free PMC article. - Evolutionary dynamics of Vibrio cholerae O1 following a single-source introduction to Haiti.
Katz LS, Petkau A, Beaulaurier J, Tyler S, Antonova ES, Turnsek MA, Guo Y, Wang S, Paxinos EE, Orata F, Gladney LM, Stroika S, Folster JP, Rowe L, Freeman MM, Knox N, Frace M, Boncy J, Graham M, Hammer BK, Boucher Y, Bashir A, Hanage WP, Van Domselaar G, Tarr CL. Katz LS, et al. mBio. 2013 Jul 2;4(4):e00398-13. doi: 10.1128/mBio.00398-13. mBio. 2013. PMID: 23820394 Free PMC article. - Identification of a genetic variant underlying familial cases of recurrent benign paroxysmal positional vertigo.
Xu Y, Zhang Y, Lopez IA, Hilbers J, Griswold AJ, Ishiyama A, Blanton S, Liu XZ, Lundberg YW. Xu Y, et al. PLoS One. 2021 May 6;16(5):e0251386. doi: 10.1371/journal.pone.0251386. eCollection 2021. PLoS One. 2021. PMID: 33956893 Free PMC article. - Mutations in DVL1 cause an osteosclerotic form of Robinow syndrome.
Bunn KJ, Daniel P, Rösken HS, O'Neill AC, Cameron-Christie SR, Morgan T, Brunner HG, Lai A, Kunst HP, Markie DM, Robertson SP. Bunn KJ, et al. Am J Hum Genet. 2015 Apr 2;96(4):623-30. doi: 10.1016/j.ajhg.2015.02.010. Epub 2015 Mar 26. Am J Hum Genet. 2015. PMID: 25817014 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
- P30 ES06639/ES/NIEHS NIH HHS/United States
- R01 ES012933/ES/NIEHS NIH HHS/United States
- P30 ES006639/ES/NIEHS NIH HHS/United States
- R01 DK071073/DK/NIDDK NIH HHS/United States
- R21 ES021983/ES/NIEHS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
Miscellaneous