AdapterRemoval: easy cleaning of next-generation sequencing reads - PubMed (original) (raw)

AdapterRemoval: easy cleaning of next-generation sequencing reads

Stinus Lindgreen. BMC Res Notes. 2012.

Abstract

Background: With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses.

Findings: We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5' and 3' ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data.

Conclusions: AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Illustration of different constructs and the reads produced. Single-end data on top, paired-end below. Inserts are denoted I, single-end reads R and paired-end reads _R_1 and _R_2. Read length denoted L R, insert length denoted L I. A) L IL R: No adapter contamination. B) L I < L R: adapter contamination occurs in 3’ end. C) L I ≥ 2· L R: No adapter contamination and no overlap between reads. D) L R < L I < 2 · L R: No adapter contamination but the two reads overlap. E) L I < L R: adapter contamination in 3’ ends of both reads, overlap between 5’ ends of reads. This information can be used to perform the pairwise alignment needed (after reverse complementing mate 2 from the pair) to locate adapter contamination and/or overlap between reads

Figure 2

Figure 2

The need for shifting the alignment due to missing nucleotides. If the read is missing a few nucleotides in the 5’ end, the proper alignment will not be recoverable if the procedure stops at the first position. As shown in 1), this leads to multiple mismatches and possibly missed adapter contamination. If the alignment is shifted by S nucleotides as shown in 2), the correct alignment can be found. The dynamic programming matrix in 3) shows which entries in the matrix leads to the two solutions shown here. The light grey part is the upper half of the matrix that is calculated by default; the two dark grey entries illustrate the two alignments shown in 1) and 2)

Similar articles

Cited by

References

    1. Niedringhaus TP, Milanova D, Kerby MB, Snyder MP, Barron AE. Landscape of next-generation sequencing technologies. Anal Chem. 2011;83(12):4327–4341. doi: 10.1021/ac2010857. - DOI - PMC - PubMed
    1. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. - DOI - PMC - PubMed
    1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. - DOI - PMC - PubMed
    1. Li R, Li Y, Kristiansen K, Wang J. SOAP: short oligonucleotide alignment program. Bioinformatics. 2008;24(5):713–714. doi: 10.1093/bioinformatics/btn025. - DOI - PubMed
    1. Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009;25(15):1966–1967. doi: 10.1093/bioinformatics/btp336. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources