Quality control and preprocessing of metagenomic datasets - PubMed (original) (raw)

Quality control and preprocessing of metagenomic datasets

Robert Schmieder et al. Bioinformatics. 2011.

Abstract

Summary: Here, we present PRINSEQ for easy and rapid quality control and data preprocessing of genomic and metagenomic datasets. Summary statistics of FASTA (and QUAL) or FASTQ files are generated in tabular and graphical form and sequences can be filtered, reformatted and trimmed by a variety of options to improve downstream analysis.

Availability and implementation: This open-source application was implemented in Perl and can be used as a stand alone version or accessed online through a user-friendly web interface. The source code, user help and additional information are available at http://prinseq.sourceforge.net/.

PubMed Disclaimer

References

    1. Blankenberg D, et al. Manipulation of FASTQ data with Galaxy. Bioinformatics. 2010;26:1783–1785. - PMC - PubMed
    1. Burge C, et al. Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl Acad. Sci. USA. 1992;89:1358–1362. - PMC - PubMed
    1. Cox MP, et al. SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics. 2010;11:485. - PMC - PubMed
    1. Gomez-Alvarez V, et al. Systematic artifacts in metagenomes from complex microbial communities. ISME J. 2009;3:1314–1317. - PubMed
    1. Morgulis A, et al. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J. Comput. Biol. 2006;13:1028. - PubMed

Publication types

MeSH terms

LinkOut - more resources