Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome - PubMed (original) (raw)

Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome

Andreia J Amaral et al. BMC Genomics. 2009.

Abstract

Background: Although the Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. Therefore, in the framework of the International Porcine SNP Chip Consortium, this pilot study aimed to evaluate the impact of the quality level of the sequenced bases on mapping quality and identification of true SNPs on a large scale.

Results: DNA pooled from five animals from a commercial boar line was digested with DraI; 150-250-bp fragments were isolated and end-sequenced using the Illumina 1 G Genome Analyzer, yielding 70,348,064 sequences 36-bp long. Rules were developed to select sequences, which were then aligned to unique positions in a reference genome. Sequences were selected based on quality, and three thresholds of sequence quality (SQ) were compared. The highest threshold of SQ allowed identification of a larger number of SNPs (17,489), distributed widely across the pig genome. In total, 3,142 SNPs were validated with a success rate of 96%. The correlation between estimated minor allele frequency (MAF) and genotyped MAF was moderate, and SNPs were highly polymorphic in other pig breeds. Lowering the SQ threshold and maintaining the same criteria for SNP identification resulted in the discovery of fewer SNPs (16,768), of which 259 were not identified using higher SQ levels. Validation of SNPs found exclusively in the lower SQ threshold had a success rate of 94% and a low correlation between estimated MAF and genotyped MAF. Base change analysis suggested that the rate of transitions in the pig genome is likely to be similar to that observed in humans. Chromosome X showed reduced nucleotide diversity relative to autosomes, as observed for other species.

Conclusion: Large numbers of SNPs can be identified reliably by creating strict rules for sequence selection, which simultaneously decreases sequence ambiguity. Selection of sequences using a higher SQ threshold leads to more reliable identification of SNPs. Lower SQ thresholds can be used to guarantee sufficient sequence coverage, resulting in high success rate but less reliable MAF estimation. Nucleotide diversity varies between porcine chromosomes, with the X chromosome showing less variation as observed in other species.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Maximum mapping quality (MMQ) (mapping quality of the best mapped sequence of a cluster) on an SNP position versus target coverage. Box plots show the data distribution for each parameter. Red dots show MMQ values for the best mapped sequence on an SNP position versus target coverage. The black solid line shows the smooth-fit line.

Figure 2

Figure 2

Venn diagram showing the number of identical SNPs between the analyzed data sets with different levels of sequence quality.

Figure 3

Figure 3

Number of identified SNPs per position in a short read for Data 20.

Figure 4

Figure 4

SNP map of each chromosome based on Data 20. The colored vertical lines represent the location of each SNP.

Figure 5

Figure 5

Sequence coverage, nucleotide diversity, and SNP occurrence along chromosome 1. Each bar represents a window of 1 Mb. Red bars show the length of the aligned consensus sequence, blue bars show the estimated level of nucleotide diversity, and green bars show the number of SNPs found in each window. The red triangle designates the position of the centromere. The blue triangle designates a position where nucleotide diversity is high where coverage is low.

Similar articles

Cited by

References

    1. Venter JC, et al. The Sequence of the Human Genome. Science. 2001;291:1304–1351. doi: 10.1126/science.1058040. - DOI - PubMed
    1. Schook LB, Beever JE, Rogers J, Humphray S, Archibald A, Chardon P, Milan D, Rohrer G, Eversole K. Swine Genome Sequencing Consortium (SGSC): A Strategic Roadmap for Sequencing The Pig Genome. Comparative and Functional Genomics. 2005;6:251–255. doi: 10.1002/cfg.479. - DOI - PMC - PubMed
    1. Ahmadian A, Ehn M, Hober S. Pyrosequencing: History, biochemistry and future. Clinica chimica acta. 2006;363:83. doi: 10.1016/j.cccn.2005.04.038. - DOI - PubMed
    1. Metzker ML. Emerging technologies in DNA sequencing. Genome research. 2005;15:1767. doi: 10.1101/gr.3770505. - DOI - PubMed
    1. Bentley DR. Whole-genome re-sequencing. Current opinion in genetics & development. 2006;16:545. doi: 10.1016/j.gde.2006.10.009. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources