BS Seeker: precise mapping for bisulfite sequencing - PubMed (original) (raw)

BS Seeker: precise mapping for bisulfite sequencing

Pao-Yang Chen et al. BMC Bioinformatics. 2010.

Abstract

Background: Bisulfite sequencing using next generation sequencers yields genome-wide measurements of DNA methylation at single nucleotide resolution. Traditional aligners are not designed for mapping bisulfite-treated reads, where the unmethylated Cs are converted to Ts. We have developed BS Seeker, an approach that converts the genome to a three-letter alphabet and uses Bowtie to align bisulfite-treated reads to a reference genome. It uses sequence tags to reduce mapping ambiguity. Post-processing of the alignments removes non-unique and low-quality mappings.

Results: We tested our aligner on synthetic data, a bisulfite-converted Arabidopsis library, and human libraries generated from two different experimental protocols. We evaluated the performance of our approach and compared it to other bisulfite aligners. The results demonstrate that among the aligners tested, BS Seeker is more versatile and faster. When mapping to the human genome, BS Seeker generates alignments significantly faster than RMAP and BSMAP. Furthermore, BS Seeker is the only alignment tool that can explicitly account for tags which are generated by certain library construction protocols.

Conclusions: BS Seeker provides fast and accurate mapping of bisulfite-converted reads. It can work with BS reads generated from the two different experimental protocols, and is able to efficiently map reads to large mammalian genomes. The Python program is freely available at http://pellegrini.mcdb.ucla.edu/BS\_Seeker/BS\_Seeker.html.

PubMed Disclaimer

Figures

Figure 1

Figure 1

The two library protocols generating bisulfite-converted reads. Cokus et al's experimental protocol uses two amplification steps for generating bisulfite-converted sequences and for high throughput sequencing. The bisulfite-converted reads are preceded by one of two tags in the first 5 nucleotides of reads. Lister et al's protocol generates bisulfite libraries using premethylated adapters, and in this case no tags are present.

Figure 2

Figure 2

Schematic diagrams of the 4 forms of BS reads, mapping and post processing. 2A. BS reads may be in one of the 4 forms: +FW, +RC, -FW, -RC. 2B. Bowtie aligns C/T converted reads to the C/T converted strands. During the post processing, the number of mismatches is counted except those between read Ts and genomic Cs. Low-quality mappings with many mismatches are removed.

Similar articles

Cited by

References

    1. Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Grigg GW, Molloy PL, Paul CL. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci USA. 1992;89(5):1827–1831. doi: 10.1073/pnas.89.5.1827. - DOI - PMC - PubMed
    1. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008;452(7184):215–219. doi: 10.1038/nature06745. - DOI - PMC - PubMed
    1. Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008;133(3):523–536. doi: 10.1016/j.cell.2008.03.029. - DOI - PMC - PubMed
    1. Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, Nusbaum C, Jaffe DB, Gnirke A, Jaenisch R, Lander ES. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008;454(7205):766–770. - PMC - PubMed
    1. Smith ZD, Gu H, Bock C, Gnirke A, Meissner A. High-throughput bisulfite sequencing in mammalian genomes. Methods. 2009;48(3):226–232. doi: 10.1016/j.ymeth.2009.05.003. - DOI - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources