Development and characterisation of an expressed sequence tags (EST)-derived single nucleotide polymorphisms (SNPs) resource in rainbow trout - PubMed (original) (raw)

Development and characterisation of an expressed sequence tags (EST)-derived single nucleotide polymorphisms (SNPs) resource in rainbow trout

Mekki Boussaha et al. BMC Genomics. 2012.

Abstract

Background: There is considerable interest in developing high-throughput genotyping with single nucleotide polymorphisms (SNPs) for the identification of genes affecting important ecological or economical traits. SNPs are evenly distributed throughout the genome and are likely to be functionally relevant. In rainbow trout, in silico screening of EST databases represents an attractive approach for de novo SNP identification. Nevertheless, EST sequencing errors and assembly of EST paralogous sequences can lead to the identification of false positive SNPs which renders the reliability of EST-derived SNPs relatively low. Further validation of EST-derived SNPs is therefore required. The objective of this work was to assess the quality of and to validate a large number of rainbow trout EST-derived SNPs.

Results: A panel of 1,152 EST-derived SNPs was selected from the INRA Sigenae SNP database and was genotyped in standard and double haploid individuals from several populations using the Illumina GoldenGate BeadXpress assay. High-quality genotyping data were obtained for 958 SNPs representing a genotyping success rate of 83.2 %, out of which, 350 SNPs (36.5 %) were polymorphic in at least one population and were designated as true SNPs. They also proved to be a potential tool to investigate genetic diversity of the species, as the set of SNP successfully sorted individuals into three main groups using STRUCTURE software. Functional annotations revealed 28 non-synonymous SNPs, out of which four substitutions were predicted to affect protein functions. A subset of 223 true SNPs were polymorphic in the two INRA mapping reference families and were integrated into the INRA microsatellite-based linkage map.

Conclusions: Our results represent the first study of EST-derived SNPs validation in rainbow trout, a species whose genome sequences is not yet available. We designed several specific filters in order to improve the genotyping yield. Nevertheless, our selection criteria should be further improved in order to reduce the observed high rate of false positive SNPs which results from the occurrence of whole genome duplications.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Selection of the validation panel. Filters used to select EST-derived SNPs for validation from the INRA Sigenae SNP database release som10 were summarized.

Figure 2

Figure 2

Distribution of observed heterozygosity for true SNPs in three populations. SNPs were clustered into categories based on their observed heterozygosity values.

Figure 3

Figure 3

Frequency distribution of PIC values across the three genotyped populations. SNPs were clustered into categories based on their observed PIC values.

Figure 4

Figure 4

Prediction of the best value of K. Delta K analysis was performed as previously described (Evanno, 2005) in order to predict the best value of K.

Figure 5

Figure 5

Genetic population structure predicted by STRUCTURE software. Genetic population structure was inferred with the structure software using K = 3, a burn-in period of 50,000 200,000 iterations for the likelihood estimation. Individuals 1 to 20 correspond to the INRA-SP population, individuals 21 to 40 correspond to the INRA-SY population and individuals 41 to 84 correspond to the NCCCWA population.

Similar articles

Cited by

References

    1. Sauna ZE, Kimchi-Sarfaty C, Ambudkar SV, Gottesman MM. The sounds of silence: synonymous mutations affect function. Pharmacogenomics. 2007;8(6):527–532. doi: 10.2217/14622416.8.6.527. - DOI - PubMed
    1. Kim H, Schmidt CJ, Decker KS, Emara MG. A double-screening method to identify reliable candidate non-synonymous SNPs from chicken EST data. Anim Genet. 2003;34(4):249–254. doi: 10.1046/j.1365-2052.2003.01003.x. - DOI - PubMed
    1. Garg K, Green P, Nickerson DA. Identification of candidate coding region single nucleotide polymorphisms in 165 human genes using assembled expressed sequence tags. Genome Res. 1999;9(11):1087–1092. doi: 10.1101/gr.9.11.1087. - DOI - PMC - PubMed
    1. Consortium WTCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661–678. doi: 10.1038/nature05911. - DOI - PMC - PubMed
    1. Valdar W, Solberg LC, Gauguier D, Burnett S, Klenerman P, Cookson WO, Taylor MS, Rawlins JN, Mott R, Flint J. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet. 2006;38(8):879–887. doi: 10.1038/ng1840. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources