A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 - PubMed (original) (raw)

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3

Pablo Cingolani et al. Fly (Austin). 2012 Apr-Jun.

Abstract

We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.

PubMed Disclaimer

Figures

None

Figure 1. Classification of SNPs in _w_1118; _iso-_2; _iso_-3. The number of NSPs in each class is shown above the bar. The quality score was arbitrarily set at 70 and above for this graph.

None

Figure 2. Analysis of Eip63E start-gained SNP in _w_1118; _iso-_2; _iso_-3. (A) Location of the start-gained SNP at the Eip63E locus. Notice that the reading frame is the same as the normal translation start site (TSS). (B) Conservation of 60 amino acid N-terminal region of Eip63E in _w_1118; _iso-_2; _iso_-3 with Drosophila yakuba orthologous gene. The other sequenced Drosophila species do not have this N-terminal sequence (not shown).

None

Figure 3. Oc/Otd has two stop-gained SNPs in _w_1118; _iso-_2; _iso_-3. (A) Location of the two stop gained SNPs in oc/otd. (B) Protein BLAST of Oc/Otd against the non-redundant (nr) protein database shows that only the 60 amino Hox domain flanking amino acid 100 is conserved from Drosophila to humans. The color coding shows the alignment scores.

None

Figure 4. CG34326 has one stop-gained SNP in _w_1118; _iso-_2; _iso_-3 in the non-conserved C-terminal region. (A) Protein BLAST of CG34326 against the non-redundant (nr) protein database shows that only the 38 N-terminal amino acids are conserved among Drosophila species and not beyond Drosophila. The colored lines represent the homologs from the following organisms: Drosophila melanogaster, Drosophila grimshawi, Drosophila yakuba, Drosophila erecta, Drosophila virilus, Ixodes scapularis, Ixodes scapularis, and Nycticebus coucang. (B) Aligment of Drosophila melanogaster CG34326 with orthologous gene from Drosophila grimshawi. (C) Aligment of Drosophila melanogaster CG34326 with orthologous gene from Drosophila yakuba.

None

Figure 5. CG13958 has a stop lost SNP in _w_1118; _iso-_2; _iso_-3. The top comparison shows the alignment of the Drosophila melanogaster reference genome with _w_1118; _iso-_2; _iso_-3. Notice that the stop lost causes an extension of nine amino acids. The second through sixth comparisons shows the alignment of Drosophila simulans, Drosophila erecta, Drosophila yakuba, Drosophila mojavensis, and Drosophila pseudoobscura pseudoobscura (Sbjct) with the Drosophila melanogaster reference genome (Dm-ref). The number of terminal amino acids missing or gained is shown (-1 to +3).

None

Figure 6. Nonsynonymous to synonymous ratios along the chromosome arms in _w_1118; _iso-_2; _iso_-3. (A) Left, Nonsynonymous SNPs at 1 Mbp intervals along the 2L chromosome arm (black) and synonymous SNPs (gray). Right, N/S ratios (NS/Syn) along the chromosome arms. Notice that N/S ratios are higher near the centromere and telomere (see text). (B-F) as in (A), but for chromosome arms 2R, 3L, 3R, 4 and X.

Similar articles

Cited by

References

    1. Platts AE, Land SJ, Chen L, Page GP, Rasouli P, Wang L, et al. Massively parallel resequencing of the isogenic Drosophila melanogaster strain w(1118); iso-2; iso-3 identifies hotspots for mutations in sensory perception genes. Fly (Austin) 2009;3:192–203. - PMC - PubMed
    1. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. - DOI - PMC - PubMed
    1. Rope AF, Wang K, Evjenth R, Xing J, Johnston JJ, Swensen JJ, et al. Using VAAST to identify an X-linked disorder resulting in lethality in male infants due to N-terminal acetyltransferase deficiency. Am J Hum Genet. 2011;89:28–43. doi: 10.1016/j.ajhg.2011.05.017. - DOI - PMC - PubMed
    1. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303. doi: 10.1101/gr.107524.110. - DOI - PMC - PubMed
    1. Thibault ST, Singer MA, Miyazaki WY, Milash B, Dompe NA, Singh CM, et al. A complementary transposon tool kit for Drosophila melanogaster using P and piggyBac. Nat Genet. 2004;36:283–7. doi: 10.1038/ng1314. - DOI - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources